Skip to main content

String Functions

SyntaxDescriptionExampleNormal QueryPushdown QueryDefined as Computed ColumnSuggested as Computed Column
CHAR_LENGTH(string)Returns the number of characters in a character stringCHAR_LENGTH('Kylin')
= 5
✔️✔️✔️✔️
CHARACTER_LENGTH(string)As CHAR_LENGTH(string) CHARACTER_LENGTH('Kylin')
= 5
✔️✔️✔️✔️
UPPER(string)Returns a character string converted to upper case UPPER('Kylin')
= KYLIN
✔️✔️✔️✔️
LOWER(string)Returns a character string converted to lower case LOWER('Kylin')
= kylin
✔️✔️✔️✔️
POSITION(string1 IN string2)Returns the position of the first occurrence of string1 in string2 POSITION('Kyli' IN 'Kylin')
= 1
✔️✔️✔️✔️
TRIM( { BOTH \ LEADING\ TRAILING } string1 FROM string2)Removes the longest string containing only the characters in string1 from the both ends/start/end of string1Example1:
TRIM(BOTH '6' FROM '666Kylin66')
= Kylin

Example 2:
TRIM(LEADING '6' FROM '666Kylin66')
= Kylin66

Example 3:
TRIM(TRAILING '6' FROM '666Kylin66')
= 666Kylin
✔️✔️✔️
OVERLAY(string1 PLACING string2 FROM integer [ FOR integer2 ])Replaces a substring of string1 with string2 starting at the integer bitOVERLAY('666' placing 'KYLIN' FROM 2 for 2)
= 6KYLIN
✔️✔️✔️✔️
SUBSTRING(string FROM integer)Returns a substring of a character string starting at a given point SUBSTRING('Kylin' FROM 5)
= n
✔️✔️✔️✔️
SUBSTRING(string FROM integer1 FOR integer2)Returns a substring of a character string starting at a given point with a given length SUBSTRING('KYlin' from 5 for 2)
= n
✔️✔️✔️✔️
INITCAP(string)Returns string with the first letter of each word converter to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric characters. INITCAP('kylin')
= Kylin
✔️✔️✔️✔️
REPLACE(string, search, replacement)Returns a string in which all the occurrences of search in string are replaced with replacement; if replacement is the empty string, the occurrences of search are removed REPLACE('Kylin','Kyli','Kyliiiiiii')
= Kyliiiiiiin
✔️✔️✔️✔️
BASE64(bin)Converts the argument from a binary bin to a base 64 string BASE64('Spark SQL')
= U3BhcmsgU1FM
✔️✔️✔️✔️
DECODE(bin, charset)Decodes the first argument using the second argument character set (one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null DECODE(ENCODE('abc', 'utf-8'), 'utf-8')
= abc
✔️✔️✔️✔️
ENCODE(str, charset)Encodes the first argument using the second argument character set(one of 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16'). If either argument is null, the result will also be null ENCODE('abc', 'utf-8')
= [B@2b442236
✔️✔️
FIND_IN_SET(str, str_array)Returns the index (1-based) of the given string (str) in the comma-delimited list (str_array).Returns 0, if the string was not found or if the given string (str) contains a comma FIND_IN_SET('ab','abc,b,ab,c,def')
= 3
✔️✔️✔️✔️
LCASE(string)Returns string with all characters changed to lowercaseLCASE('SparkSql')
= sparksql
✔️✔️✔️✔️
LEVENSHTEIN(str, str)Returns the Levenshtein distance between the two given strings LEVENSHTEIN('kitten', 'sitting')
= 3
✔️✔️✔️✔️
LOCATE(substr, str[, pos])Returns the position of the first occurrence of substr in str after position pos. The given pos and return value are 1-based LOCATE('bar', 'foobarbar')
= 4
LOCATE('bar', 'foobarbar', 5)
=7
✔️✔️✔️✔️
LPAD(str, len, pad)Returns str, left-padded with pad to a length of len.If str is longer than len, the return value is shortened to len characters. LPAD('hi', 5, '??')
= ???hi
✔️✔️✔️✔️
RPAD(str, len, pad)Returns str, right-padded with pad to a length of len. If str is longer than len, the return value is shortened to len characters RPAD('hi', 5, '??')
= hi???
✔️✔️✔️✔️
RTRIM(trimStr, str)Removes the trailing string which contains the characters from the trim string from the strRTRIM('KR', 'SPARK')
= SPA
✔️✔️✔️✔️
SENTENCES(str)Splits str into an array of array of words. SENTENCES('Hi there! Good morning.')
= [["Hi","there"],["Good","morning"]]
✔️✔️
SUBSTRING_INDEX(str, delim, count)Returns the substring from str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. The function substring_index performs a case-sensitive match when searching for delim SUBSTRING_INDEX('www.apache.org', '.', 1)
= www
✔️✔️✔️✔️
UCASE(str)Returns str with all characters changed to uppercase. UCASE('SparkSql')
= SPARKSQL
✔️✔️✔️✔️
UNBASE64(str)Converts the argument from a base 64 string str to a binary. UNBASE64('U3BhcmsgU1FM')
= [B@7c66820d
✔️✔️
ASCII(str)Convert character to corresponding ascii code ASCII('a') = 97✔️✔️✔️
CHR(str)Convert ascii code to corresponding character CHR(97) = a✔️✔️✔️
SPACE(len)Generate len number of continuous spacespace(2) = ✔️
SPLIT_PART(str, separator, index)Split str with separator and return the index-th token. index counts from 1. when index is negative, tokens are counted starting from the right.split_part('a-b-c', '-', 1) = a,
split_part('a-b-c', '-', -1) = c,
✔️✔️✔️
CONCAT(any, [any]*)Concatenates multiple data of any type into a stringconcat('Kyl', 'in') = Kylin✔️✔️✔️✔️
REPEAT(str,n)Repeat str n times and return string. When querying the model, str supports passing in constants, columns and expressions, and n only supports passing in constantsrepeat('kylin',2) ='kylinkylin'✔️✔️✔️✔️
LEFT(str,n)Return the n characters from the left of str. When querying the model, str supports passing in constants, columns, and n only supports passing in constantsleft('kylin',2) ='ky'✔️✔️✔️✔️