Skip to main content
Version: 5.0.0

Basic Functions

This chapter introduces basic functions applicable to SQL statements.

Arithmetic functions

SyntaxDescriptionExampleModelPush downComputed Column
ABS(num)Returns the absolute value of numABS(-2) = 2⎷️⎷️⎷️
CEIL(num)Rounds num up, returning the smallest integer that is greater than or equal to numCEIL(-2.2) = -2⎷️⎷️⎷️
FLOOR(num)Rounds num down, returning the largest integer that is less than or equal to numFLOOR(-2.2) = -3⎷️⎷️⎷️
MOD(num1, num2)Returns the remainder (modulus) of num1 divided by num2. The result is negative only if num1 is negativeMOD(-3, 2) = -1⎷️⎷️
SQRT(num)Returns the square root of numSQRT(16) = 4.0⎷️⎷️⎷️
CBRT(num)Returns the cube root of numCBRT(27.0) = 3.0⎷️⎷️⎷️
HYPOT(num1, num2)Returns the Euclidean distance of num1 and num2HYPOT(3, 4) = 5.0⎷️⎷️⎷️
LN(num)Returns the natural logarithm (base e) of numLN(2) = 0.6931471805599453⎷️⎷️⎷️
LOG(base, num)Returns the logarithm of num with baseLOG(10, 100) = 2.0⎷️⎷️⎷️
LOG10(num)Returns the base 10 logarithm of numericLOG10(100) = 2.0⎷️⎷️⎷️
LOG1P(num)Returns ln(1 + num)LOG1P(0) = 0.0⎷️⎷️⎷️
LOG2(num)Returns the logarithm of num with base 2LOG2(2) = 1.0⎷️⎷️⎷️
EXP(num)Returns e raised to the power of numEXP(1) = 2.718281828459045⎷️⎷️⎷️
EXPM1(num)Returns exp(num) - 1EXPM1(0) = 0.0⎷️⎷️⎷️
POWER(num1, num2)Returns num1 raised to the power of num2OWER(5,2) = 25.0⎷️⎷️⎷️
RAND([seed])Generates a random double between 0 and 1 inclusive, optionally initializing the random number generator with seedRAND(15) = 0.45951471073476047⎷️
COS(num)Returns the cosine of numCOS(5) = 0.28366218546322625⎷️⎷️⎷️
SIN(num)Returns the sine of numSIN(5) = -0.9589242746631385⎷️⎷️⎷️
TAN(num)Returns the tangent of numTAN(5) = -3.380515006246586⎷️⎷️⎷️
COT(num)Returns the cotangent of numCOT(5) = -0.2958129155327455⎷️⎷️⎷️
ACOS(num)Returns the arc cosine of numACOS(0.8) = 0.6435011087932843⎷️⎷️⎷️
ASIN(num)Returns the arc sine of numASIN(0.8) = 0.9272952180016123⎷️⎷️⎷️
ATAN(num)Returns the arc tangent of numATAN(0.8) = 0.6747409422235527⎷️⎷️⎷️
ATAN2(num1, num2)Returns the arc tangent of the num coordinatesATAN2(0.2, 0.8) = 0.24497866312686414⎷️⎷️⎷️
COSH(num)Returns the hyperbolic cosine of numCOSH(0) = 1.0⎷️⎷️⎷️
SINH(num)Returns hyperbolic sine of numSINH(0) = 0.0⎷️⎷️⎷️
TANH(num)Returns the hyperbolic tangent of numTANH(0) = 0.0⎷️⎷️⎷️
DEGREES(num)Converts num from radians to degreesDEGREES(5) = 286.4788975654116⎷️⎷️⎷️
PIReturns a value that is closer than any other value to piPI = 3.141592653589793⎷️
RADIANS(num)Converts num from degrees to radiansRADIANS(90) = 1.5707963267948966⎷️⎷️⎷️
BROUND(num1, int1)Returns num1 rounded to int1 decimal places using HALF_EVEN rounding modeBROUND(2.5, 0) = 2.0⎷️⎷️⎷️
ROUND(num1, int1)Returns num1 rounded to int1 decimal places using HALF_UP rounding modeROUND(2.5, 0) = 3⎷️⎷️⎷️
RINT(num)Returns the double value that is closest in value to the argument and is equal to a mathematical integerRINT(12.3456) = 12.0⎷️⎷️⎷️
SIGN(num)Returns the signum of numSIGN(-5) = -1.0⎷️⎷️⎷️
CONV(num, from_base, to_base)Convert num from base from_base to base to_baseCONV(-10, 16, -10) = -16⎷️⎷️⎷️
CONV(str, from_base, to_base)Convert str (implicit numeric type) from base from_base to base to_baseCONV('100', 2, 10) = 4⎷️⎷️⎷️
TRUNCATE(num1, num2)Truncates num1 to optionally num2(if not specified 0) places right to the decimal pointTRUNCATE(5.55555,2) = 5.5⎷️⎷️⎷️
FACTORIAL(num)Returns the factorial of num. The num is between 0 and 20. Otherwise, nullFACTORIAL(5) = 120⎷️⎷️⎷️
NOTE

If the parameter of the POWER function is a constant conversion, like CAST(2 AS DOUBLE), it may fail to be recommended as computed column.

Character functions

SyntaxDescriptionExampleModelPush DownComputed Column
CHAR_LENGTH(str)
CHARACTER_LENGTH(str)
Returns the number of characters in a character stringCHAR_LENGTH('Kylin') = 5

CHARACTER_LENGTH('Kylin') = 5
⎷️⎷️⎷️
UPPER(str)Returns a character string converted to upper caseUPPER('Kylin') = KYLIN⎷️⎷️⎷️
LOWER(str)Returns a character string converted to lower caseLOWER('Kylin') = kylin⎷️⎷️⎷️
POSITION(str1 IN str2)Returns the position of the first occurrence of str1 in str2POSITION('Kyli' IN 'Kylin') = 1⎷️⎷️⎷️
TRIM( { BOTH | LEADING | TRAILING } str1 FROM str2)Removes the longest string containing only the characters in str1 from the both ends/start/end of str2TRIM(BOTH '1' FROM '1Kylin1') = Kylin

TRIM(LEADING '1' FROM '1Kylin1') = Kylin1

TRIM(TRAILING '1' FROM '1Kylin1') = 1Kylin
⎷️⎷️⎷️
OVERLAY(str1 PLACING str2 FROM integer [ FOR pos ])Replaces a substring of str1 with str2 starting at the integer bit posOVERLAY('111' placing 'KYLIN' FROM 2 for 2) = 1KYLIN⎷️⎷️⎷️
SUBSTRING(str FROM pos)Returns a substring of a character string str starting at a given position posSUBSTRING('Kylin' FROM 5) = n⎷️⎷️⎷️
SUBSTRING(str FROM pos FOR len)Returns a substring of a character string str starting at a given position pos with a given length lenSUBSTRING('KYlin' from 5 for 2) = n⎷️⎷️⎷️
INITCAP(str)Returns str with the first letter of each word converter to upper case and the rest to lower case. Words are sequences of alphanumeric characters separated by non-alphanumeric charactersINITCAP('kylin') = Kylin⎷️⎷️⎷️
REPLACE(str, ptn, replacement)Returns a string in which all the occurrences of ptn in str are replaced with replacement; if replacement is the empty string, the occurrences of ptn are removedREPLACE('Kylin','Kyli','Kyliiiiiii') = Kyliiiiiiin⎷️⎷️⎷️
BASE64(bin)Converts the argument from a binary bin to a base 64 stringBASE64('Spark SQL') = U3BhcmsgU1FM⎷️⎷️⎷️
DECODE(bin, charset)Decodes the first argument bin using the given charset, charset is one of ['US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16']. If either argument is null, the result will also be nullDECODE(ENCODE('abc', 'utf-8'), 'utf-8') = abc⎷️⎷️⎷️
ENCODE(str, charset)Encodes the first argument str using the given charset, charset is one of ['US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16']. If either argument is null, the result will also be nullENCODE('abc', 'utf-8') = [B@2b442236⎷️⎷️
FIND_IN_SET(str, str_array)Returns the index (1-based) of the given string (str) in the comma-delimited list (str_array).Returns 0, if the string was not found or if the given string (str) contains a commaFIND_IN_SET('ab','abc,b,ab,c,def') = 3⎷️⎷️⎷️
LCASE(str)Returns str with all characters changed to lowercaseLCASE('SparkSql') = sparksql⎷️⎷️⎷️
LEVENSHTEIN(str1, str2)Returns the Levenshtein distance between the two given stringsLEVENSHTEIN('kitten', 'sitting') = 3⎷️⎷️⎷️
LOCATE(ptn, str[, pos])Returns the position of the first occurrence of ptn in str after position pos. The given pos and return value are 1-basedLOCATE('bar', 'foobarbar') = 4

LOCATE('bar', 'foobarbar', 5) =7
⎷️⎷️⎷️
LPAD(str, len, pad)Returns str, left-padded with pad to a length of len.If str is longer than len, the return value is shortened to len charactersLPAD('hi', 5, '??') = ???hi⎷️⎷️⎷️
RPAD(str, len, pad)Returns str, right-padded with pad to a length of len. If str is longer than len, the return value is shortened to len charactersRPAD('hi', 5, '??') = hi???⎷️⎷️⎷️
RTRIM(trimStr, str)Removes the trailing string which contains the characters from the trim string from the strRTRIM('KR', 'SPARK') = SPA⎷️⎷️⎷️
SENTENCES(str)Splits str into an array of array of words.SENTENCES('Hi there! Good morning.') = [["Hi","there"],["Good","morning"]]⎷️⎷️
SUBSTRING_INDEX(str, delim, count)Returns the substring from str before count occurrences of the delimiter delim. If count is positive, everything to the left of the final delimiter (counting from the left) is returned. If count is negative, everything to the right of the final delimiter (counting from the right) is returned. The function substring_index performs a case-sensitive match when searching for delimSUBSTRING_INDEX('www.apache.org', '.', 1) = www⎷️⎷️⎷️
UCASE(str)Returns str with all characters changed to uppercaseUCASE('SparkSql') = SPARKSQL⎷️⎷️⎷️
UNBASE64(str)Converts the argument from a base 64 string str to a binaryUNBASE64('U3BhcmsgU1FM') = [B@7c66820d⎷️⎷️
ASCII(str)Convert character to corresponding ascii codeASCII('a') = 97⎷️⎷️
CHR(str)Convert ascii code to corresponding characterCHR(97) = a⎷️⎷️
SPACE(len)Generate len number of continuous spacespace(2) = □□

□ means one space
⎷️
SPLIT_PART(str, separator, index)Split str with separator and return the index-th token. index counts from 1. when index is negative, tokens are counted starting from the rightSPLIT_PART('a-b-c', '-', 1) = a

SPLIT_PART('a-b-c', '-', -1) = c
⎷️⎷️
CONCAT(any[, any]*)Concatenates multiple data of any type into a stringCONCAT('Kyl', 'in') = Kylin⎷️⎷️⎷️
REPEAT(str,n)Repeat str with n times and return string. When querying the model, str supports passing in constants, columns and expressions, and n only supports passing in constantsREPEAT('kylin',2) = kylinkylin⎷️⎷️⎷️
LEFT(str,n)Return the n characters from the left of str. When querying the model, str supports passing in constants, columns, and n only supports passing in constantsLEFT('kylin',2) = ky⎷️⎷️⎷️

Date/time functions

SyntaxDescriptionExampleModelPush DownComputed Column
CURRENT_DATEReturns the current date in the session time zone, in a value of datatype DATECURRENT_DATE = 2018-10-10⎷️⎷️
CURRENT_TIMESTAMPReturns the current timestamp in the session time zone, in a value of datatype timestampCURRENT_TIMESTAMP = 2022-07-19 10:24:44.708⎷️⎷️
EXTRACT(timeUnit FROM datetime)Extracts and returns the value of a specified datetime field from a datetime value expressionEXTRACT(minute FROM timestamp '2018-10-10 11:47:16') = 47⎷️⎷️⎷️
FLOOR(datetime TO timeUnit)Rounds datetime down to timeUnit where timeunit can be one of [year, quarter, month, week, day, hour, minute, second]FLOOR(timestamp '2018-10-10 11:47:16' TO year) = 2018-01-01 00:00:00⎷️⎷️⎷️
CEIL(datetime TO timeUnit)Rounds datetime up to timeUnit where timeunit can be one of [year, quarter, month, week, day, hour, minute, second]CEIL(timestamp '2018-10-10 11:47:16' TO minute) = 2018-10-10 11:48:00⎷️⎷️⎷️
YEAR(date)Equivalent to EXTRACT(YEAR FROM date). Returns an integerYEAR(date '2019-01-02') = 2019⎷️⎷️⎷️
QUARTER(date)Equivalent to EXTRACT(QUARTER FROM date). Returns an integer between 1 and 4QUARTER(date '2019-01-02') = 1⎷️⎷️⎷️
NOW(date)Returns the current timestamp at the start of query evaluationNOW() = 2019-09-24 17:19:09.932⎷️⎷️
MONTH(date)Equivalent to EXTRACT(MONTH FROM date). Returns an integer between 1 and 12MONTH(date '2019-01-02') = 1⎷️⎷️⎷️
WEEK(date)Equivalent to EXTRACT(WEEK FROM date). Returns an integer between 1 and 53WEEK(date '2019-01-02') = 1⎷️⎷️⎷️
DAYOFYEAR(date)Equivalent to EXTRACT(DOY FROM date). Returns an integer between 1 and 366DAYOFYEAR(date '2019-10-03') = 276⎷️⎷️⎷️
DAYOFMONTH(date)Equivalent to EXTRACT(DAY FROM date). Returns an integer between 1 and 31DAYOFMONTH(date '2019-10-03') = 3⎷️⎷️⎷️
DAYOFWEEK(date)Equivalent to EXTRACT(DOW FROM date). Returns an integer between 1 and 7DAYOFWEEK(date '2019-10-03') = 5⎷️⎷️⎷️
HOUR(date)Equivalent to EXTRACT(HOUR FROM date). Returns an integer between 0 and 23HOUR(timestamp '2019-01-02 14:01:50') = 14⎷️⎷️⎷️
MINUTE(date)Equivalent to EXTRACT(MINUTE FROM date). Returns an integer between 0 and 59MINUTE(timestamp '2019-01-02 14:01:50') = 1⎷️⎷️⎷️
SECOND(date)Equivalent to EXTRACT(SECOND FROM date). Returns an integer between 0 and 59SECOND(timestamp '2019-01-02 14:01:50') = 50⎷️⎷️⎷️
TIMESTAMPADD(timeUnit, integer, datetime)Returns datetime with an interval of (signed) integer timeUnits added. Equivalent to 'datetime + INTERVAL integer timeUnit'TIMESTAMPADD(month, 1, '2018-10-10') = 2018-11-10⎷️⎷️⎷️
TRUNC(date, fmt)Returns date with the time portion of the day truncated to the unit specified by the format model fmt. fmt should be one of [year, yyyy, yy, mon, month, mm]TRUNC(date '2009-02-12', 'MM') = 2009-02-01⎷️⎷️⎷️
ADD_MONTHS(start_date, num_months)Returns the date that is num_months after start_dateADD_MONTHS(date '2016-08-31', 1) = 2016-09-30⎷️⎷️⎷️
DATE_ADD(start_date, num_days)Returns the date that is num_days after start_dateDATE_ADD(date '2016-07-30', 1) = 2016-07-31⎷️⎷️⎷️
DATE_SUB(start_date, num_days)Returns the date that is num_days before start_dateDATE_SUB(date '2016-07-30', 1) = 2016-07-29⎷️⎷️⎷️
FROM_UNIXTIME(unix_time, format)Returns unix_time in the specified formatFROM_UNIXTIME(0, 'yyyy-MM-dd HH:mm:ss') = 1970-01-01 00:00:00⎷️⎷️⎷️
FROM_UTC_TIMESTAMP(timestamp, timezone)Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in UTC, and renders that time as a timestamp in the given time zoneFROM_UTC_TIMESTAMP('2016-08-31', 'Asia/Seoul') = 2016-08-31 09:00:00⎷️⎷️⎷️
MONTHS_BETWEEN(timestamp1, timestamp2)If timestamp1 is later than timestamp2, then the result is positive. If timestamp1 and timestamp2 are on the same day of month, or both are the last day of month, time of day will be ignored. Otherwise, the difference is calculated based on 31 days per monthMONTHS_BETWEEN('1997-02-28 10:30:00', '1996-10-30') = 3.94959677⎷️⎷️⎷️
TO_UTC_TIMESTAMP(timestamp, timezone)Given a timestamp like '2017-07-14 02:40:00.0', interprets it as a time in the given time zone, and renders that time as a timestamp in UTCFROM_UTC_TIMESTAMP(timestamp '2015-03-02 06:05:00', 'America/Toronto') = 2015-03-02 01:05:00.0⎷️⎷️
UNIX_TIMESTAMP(datetime, fmt)Returns the seconds from '1970-01-01 08:00:00' to datetime. Please specify the format fmt of datetimeUNIX_TIMESTAMP('2016-04-08 09:00:00', 'yyyy-MM-dd HH:mm:ss') = 1460106000⎷️⎷️⎷️
DATEDIFF(endDate, startDate)Returns the number of days from startDate to endDateDATEDIFF(date '2022-02-03', date '2022-02-01') = 2⎷️⎷️⎷️

Conditional functions

SyntaxDescriptionExampleModelPush DownComputed Column
CASE value WHEN value1 THEN result1 WHEN valueN THEN resultN ELSE resultZ ENDSimple caseCASE OPS_REGION WHEN 'Beijing' THEN 'BJ' WHEN 'Shanghai' THEN 'SH' WHEN 'Hongkong' THEN 'HK' END FROM KYLIN_SALES = HK SH BJ⎷️⎷️⎷️
CASE WHEN condition1 THEN result1 WHEN conditionN THEN resultN ELSE resultZ ENDSearched caseCASE WHEN OPS_REGION ='Beijing'THEN 'BJ' WHEN OPS_REGION ='Shanghai' THEN 'SH' WHEN OPS_REGION = 'Hongkong' THEN 'HK' END FROM KYLIN_SALES = HK SH BJ⎷️⎷️⎷️
NULLIF(value, value)Return NULL if the values are the same. Otherwise, return the first valueNULLIF(5,5) = null⎷️⎷️⎷️
COALESCE(value, value [, value ]*)Return the first not null valueCOALESCE(NULL,NULL,5) = 5⎷️⎷️⎷️
IFNULL(value1, value2)Return value2 if value1 is NULL. Otherwise, return value1IFNULL('kylin','apache') = 'kylin'⎷️⎷️⎷️
ISNULL(value)Return true if value is NULL. Otherwise, return falseISNULL('kylin') = false⎷️⎷️⎷️
NVL(value1, value2)Return value2 if value1 is NULL. Otherwise, return value1. Value1, value2 must have same data typeNVL('kylin','apache') = 'kylin'⎷️⎷️⎷️

Miscellaneous functions

SyntaxDescriptionExampleModelPush DownComputed Column
UUID()Returns an universally unique identifier (UUID) string. The value is returned as a canonical UUID 36-character stringUUID() = 46707d92-02f4-4817-8116-a4c3b23e6266⎷️
MONOTONICALLY
_INCREASING_ID()
Returns monotonically increasing 64-bit integers. The generated ID is guaranteed to be monotonically increasing and unique, but not consecutiveMONOTONICALLY_INCREASING_ID() = 1111111⎷️⎷️
EXPLODE(array)Returns new rows for each element in the given array columnEXPLODE(array[1, 2, 3]) =
1
2
3
⎷️
SIZE(expr)Expr must be of type array or map and return the number of elements contained in array or mapSIZE(array[1, 2, 3]) = 3⎷️⎷️

Type conversion functions

SyntaxDescriptionExampleModelPush DownComputed Column
CAST(value AS type)Converts a value to a given typeCAST(CURRENT_DATE as varchar) = 2018-10-10⎷️⎷️⎷️
DATE<String>Converts a string to DATE type,
equaling to CAST(string AS date)
DATE '2018-10-10' = 2018-10-10

MONTH(DATE '2018-10-10') = 10
⎷️⎷️
TIMESTAMP<String>Converts a string to TIMESTAMP type,
equaling to CAST(string AS timestamp)
TIMESTAMP '2018-10-10 15:57:07 = 2018-10-10 15:57:07

SECOND(TIMESTAMP '2018-10-10 15:57:07') = 7
⎷️⎷️
NOTE
  1. Only the following types of conversions are supported: char, varchar, boolean, int, integer, tinyint, smallint, bigint, float, double, decimal, numeric, date, time, timestamp
  2. However, conversion from bigint to timestamp is not supported at this time
  3. If converting from char to int, non-numeric values in char will return null
  4. Length n does not take effect when other data types are converted to char(n) or varchar(n)

Aggregate functions

SyntaxDescriptionExampleModelPush DownComputed Column
AVG(numeric)Returns the average (arithmetic mean) of numeric across all input valuesSELECT AVG(PRICE) FROM KYLIN_SALES;

49.23855638491023
⎷️⎷️
SUM(numeric)Returns the sum of numeric across all input valuesSELECT SUM(PRICE) FROM KYLIN_SALES;

244075.5240
⎷️⎷️
MAX(value)Returns the maximum value of value across all input valuesSELECT MAX(PRICE) FROM KYLIN_SALES;

99.9865
⎷️⎷️
MIN(value)Returns the minimum value of value across all input valuesSELECT MIN(PRICE) FROM KYLIN_SALES;

0.0008
⎷️⎷️
COUNT(value)Returns the number of input rows for which value is not null (wholly not null if value is composite)SELECT count(PRICE) FROM KYLIN_SALES;

4957
⎷️⎷️
COUNT(*)Returns the number of input rowsSELECT COUNT(*) FROM KYLIN_COUNTRY;

244
⎷️⎷️
CORR(value1, value2)Returns the correlation of two columnsSELECT CORR(ITEM_COUNT, PRICE) FROM KYLIN_SALES

0.1278
⎷️⎷️