SQL Reference
Apache Kylin relies on Apache Calcite to parse and optimize the SQL statements. As an OLAP engine, Kylin supports SELECT
statements, while doesn't support others like INSERT
, UPDATE
and DELETE
operations in SQL, so Kylin's SQL grammer is a subset of Apache Calcite. This page lists the SQL grammar, the functions and the basic data types that Kylin supports. You can also check Calcite SQL reference for more detailed info.
Grammarβ
QUERY SYNTAX SELECT γSTATEMENT γEXPRESSION SUBQUERY JOIN γINNER JOIN γLEFT JOIN UNION UNION ALL
Functionβ
COUNT γCOUNT(COLUMN) γCOUNT(*) COUNT_DISTINCT MAX MIN PERCENTILE SUM TOP_N WINDOW γROW_NUMBER γAVG γRANK γDENSE_RANK γFIRST_VALUE γLAST_VALUE γLAG γLEAD γNTILE γCASE WHEN γCAST SUSTRING COALESCE STDDEV_SUM INTERSECT_COUNT INTERSECT_VALUE
Data Typeβ
Sql Execution Planβ
You can using explain plan for
to displays execution plan, such as:
explain plan for select count(*) from KYLIN_SALES
The front-end displays the string of a one line execution plan. It is better to export the execution plan through the result export to view it.
QUERY SYNTAXβ
statement:
| query
query:
values
| WITH withItem [ , withItem ]* query
| {
select
| selectWithoutFrom
| query UNION [ ALL | DISTINCT ] query
| query INTERSECT [ ALL | DISTINCT ] query
}
[ ORDER BY orderItem [, orderItem ]* ]
[ LIMIT { count | ALL } ]
[ OFFSET start { ROW | ROWS } ]
[ FETCH { FIRST | NEXT } [ count ] { ROW| ROWS } ]
withItem:
name
['(' column [, column ]* ')' ]
AS '(' query ')'
orderItem:
expression [ ASC | DESC ]οΌ» NULLS FIRST |NULLS LAST οΌ½
select:
SELECT [ ALL | DISTINCT]
{ * | projectItem [, projectItem ]* }
FROM tableExpression
[ WHERE booleanExpression ]
[ GROUP BY { groupItem [, groupItem ]* }]
[ HAVING booleanExpression ]
[ WINDOW windowName AS windowSpec [,windowName AS windowSpec ]* ]
selectWithoutFrom:
SELECT [ ALL | DISTINCT ]
{ * | projectItem [, projectItem ]* }
projectItem:
expression [ [ AS ] columnAlias ]
| tableAlias . *
tableExpression:
tableReference [, tableReference ]*
| tableExpression [ NATURAL ]οΌ»( LEFT | RIGHT | FULL ) [ OUTER ] οΌ½ JOIN tableExpression [ joinCondition ]
joinCondition:
ON booleanExpression
| USING '(' column [, column ]* ')'
tableReference:
tablePrimary
[ matchRecognize ]
[ [ AS ] alias [ '(' columnAlias [,columnAlias ]* ')' ] ]
tablePrimary:
[ [ catalogName . ] schemaName . ] tableName
'(' TABLE [ [ catalogName . ] schemaName. ] tableName ')'
| [ LATERAL ] '(' query ')'
| UNNEST '(' expression ')' [ WITH ORDINALITY ]
| [ LATERAL ] TABLE '(' [ SPECIFIC ] functionName '(' expression [, expression ]*')' ')'
values:
VALUES expression [, expression ]*
groupItem:
expression
| '('')'
| '('expression [, expression ]* ')'
| GROUPING SETS '(' groupItem [, groupItem ]* ')'
windowRef:
windowName
| windowSpec
windowSpec:
[windowName ]
'('
[ ORDER BY orderItem [, orderItem ]* ]
[ PARTITION BY expression [, expression]* ]
[
RANGE numericOrIntervalExpression { PRECEDING | FOLLOWING }
| ROWS numericExpression { PRECEDING | FOLLOWING }
]
')'
SELECTβ
STATEMENTβ
SELECT
chooses the data from the table. COUNT
is used for quantitative statistics. DISTINCT
filters out duplicate results. AS
is used to alias tables or columns. FROM
identifies the table being queried. JOIN
is used to connect two tables to get the desired data. WHERE
is used to specify the standard of selection. LIKE
is used to search for a specified pattern in a column in a WHERE
clause. BETWEEN ... AND
is used to select a range of data between two values. AND
and OR
is used to filter records based on more than one condition. GROUP BY
groups the result by the given expression(s). HAVING
filters rows after grouping. ORDER BY
sorts the result based on the given expression, usually uses with TOPN
function. LIMIT
limits the number of rows returned by the query.
Example:
SELECT COUNT(*) FROM kylin_sales;
SELECT COUNT(DISTINCT seller_id) FROM kylin_sales;
SELECT seller_id, COUNT(1) FROM kylin_sales GROUP BY seller_id;
SELECT lstg_format_name, SUM(price) AS gmv, COUNT(DISTINCT seller_id) AS dist_seller FROM kylin_sales WHERE lstg_format_name='FP-GTC' GROUP BY lstg_format_name HAVING COUNT(DISTINCT seller_id) > 50;
SELECT lstg_format_name FROM kylin_sales INNER JOIN kylin_cal_dt AS kylin_cal_dt ON kylin_sales.part_dt = kylin_cal_dt.cal_dt WHERE NOT(lstg_format_name NOT LIKE '%ab%') GROUP BY lstg_format_name;
SELECT kylin_cal_dt.cal_dt FROM kylin_sales INNER JOIN kylin_cal_dt AS kylin_cal_dt ON kylin_sales.part_dt = kylin_cal_dt.cal_dt WHERE kylin_cal_dt.cal_dt BETWEEN DATE '2013-01-01' AND DATE '2013-06-04';
SELECT kylin_sales.lstg_format_name, SUM(kylin_sales.price) AS gmv, COUNT(*) AS trans_cnt FROM kylin_sales WHERE kylin_sales.lstg_format_name IS NULL GROUP BY kylin_sales.lstg_format_name HAVING SUM(price)>5000 AND COUNT(*)>72;
SELECT kylin_sales.lstg_format_name, SUM(kylin_sales.price) AS gmv, COUNT(*) AS trans_cnt FROM kylin_sales WHERE kylin_sales.lstg_format_name IS NOT NULL GROUP BY kylin_sales.lstg_format_name HAVING SUM(price)>5000 OR COUNT(*)>20;
SELECT lstg_format_name, SUM(price) AS gmv, COUNT(1) AS trans_cnt FROM kylin_sales WHERE lstg_format_name='FP-GTC' GROUP BY lstg_format_name ORDER BY lstg_format_name LIMIT 10;
EXPRESSIONβ
An expression in a SELECT
statement. All columns in a table may be selected using *.
Example:
- *
- ID AS VALUE
- VALUE + 1
SUBQUERYβ
Example:
SELECT cal_dt ,sum(price) AS sum_price FROM (SELECT kylin_cal_dt.cal_dt, kylin_sales.price FROM kylin_sales INNER JOIN kylin_cal_dt AS kylin_cal_dt ON kylin_sales.part_dt = kylin_cal_dt.cal_dt INNER JOIN kylin_category_groupings ON kylin_sales.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND kylin_sales.lstg_site_id = kylin_category_groupings.site_id) t GROUP BY cal_dt;
JOINβ
INNER JOINβ
The INNER JOIN
keyword returns rows when there is at least one match in the table.
Example:
SELECT kylin_cal_dt.cal_dt, kylin_sales.price FROM kylin_sales INNER JOIN kylin_cal_dt AS kylin_cal_dt ON kylin_sales.part_dt = kylin_cal_dt.cal_dt;
LEFT JOINβ
The LEFT JOIN
keyword returns all rows from the left table (kylin_sales), even if there are no matching rows in the right table (kylin_category_groupings).
Example:
SELECT seller_id FROM kylin_sales LEFT JOIN kylin_category_groupings AS kylin_category_groupings ON kylin_sales.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND kylin_sales.lstg_site_id = kylin_category_groupings.site_id WHERE lstg_format_name='FP-GTC' GROUP BY seller_id LIMIT 20;
UNIONβ
The UNION
operator is used to combine the result sets of two or more SELECT
statements.
Note that the SELECT
statement inside UNION
must have the same number of columns. Columns must also have similar data types. At the same time, the order of the columns in each SELECT
statement must be the same.
By default, the UNION
operator picks a different value. If you allow duplicate values, use UNION ALL
.
Example:
SELECT SUM(price) AS x, seller_id, COUNT(*) AS y FROM kylin_sales WHERE part_dt < DATE '2012-08-01' GROUP BY seller_id UNION (SELECT SUM(price) AS x, seller_id, COUNT(*) AS y FROM kylin_sales WHERE part_dt > DATE '2012-12-01' GROUP BY seller_id);
UNION ALLβ
The UNION ALL
command is almost equivalent to the UNION
command, but the UNION ALL
command lists all values.
Example:
SELECT COUNT(trans_id) AS trans_id FROM kylin_sales AS test_a WHERE trans_id <> 1 UNION ALL SELECT COUNT(trans_id) AS trans_id FROM kylin_sales AS test_b;
COUNTβ
Returns the number of rows matching the specified criteria.
COUNT(COLUMN)β
Example:
SELECT COUNT(seller_id) FROM kylin_sales;
COUNT(*)β
Example:
SELECT COUNT(*) FROM kylin_sales;
COUNT_DISTINCTβ
Example:
SELECT COUNT(DISTINCT seller_id) AS DIST_SELLER FROM kylin_sales;
MAXβ
Returns the maximum value in a column. NULL values are not included in the calculation. Example:
SELECT MAX(lstg_site_id) FROM kylin_sales;
MINβ
Returns the minimum value in a column. NULL values are not included in the calculation. Example:
SELECT MIN(lstg_site_id) FROM kylin_sales;
PERCENTILEβ
Example:
SELECT seller_id, PERCENTILE(price, 0.5) FROM kylin_sales GROUP BY seller_id;
SELECT seller_id, PERCENTILE_APPROX(price, 0.5) FROM kylin_sales GROUP BY seller_id;
SUMβ
Returns the total number of numeric columns. Example:
SELECT SUM(price) FROM kylin_sales;
TOP_Nβ
Example:
SELECT SUM(price) AS gmv
FROM kylin_sales
INNER JOIN kylin_cal_dt AS kylin_cal_dt
ON kylin_sales.part_dt = kylin_cal_dt.cal_dt
INNER JOIN kylin_category_groupings
ON kylin_sales.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND kylin_sales.lstg_site_id = kylin_category_groupings.site_id
WHERE kylin_cal_dt.cal_dt between DATE '2013-09-01' AND DATE '2013-10-01' AND (lstg_format_name='FP-GTC' OR 'a' = 'b')
GROUP BY kylin_cal_dt.cal_dt;
SELECT kylin_sales.part_dt, seller_id
FROM kylin_sales
INNER JOIN kylin_cal_dt AS kylin_cal_dt
ON kylin_sales.part_dt = kylin_cal_dt.cal_dt
INNER JOIN kylin_category_groupings
ON kylin_sales.leaf_categ_id = kylin_category_groupings.leaf_categ_id
AND kylin_sales.lstg_site_id = kylin_category_groupings.site_id
GROUP BY
kylin_sales.part_dt, kylin_sales.seller_id ORDER BY SUM(kylin_sales.price) DESC LIMIT 20;
WINDOWβ
The WINDOW
function performs the calculation on a set of table rows associated with the current row.
NOTE: OVER
clause is necessary for window functions.
ROW_NUMBERβ
Example:
SELECT lstg_format_name, SUM(price) AS gmv, ROW_NUMBER() OVER() FROM kylin_sales GROUP BY lstg_format_name;
AVGβ
Returns the average of the numeric columns. NULL values are not included in the calculation. Example:
SELECT lstg_format_name, AVG(SUM(price)) OVER(PARTITION BY lstg_format_name) FROM kylin_sales GROUP BY part_dt, lstg_format_name;
RANKβ
Example:
SELECT part_dt, lstg_format_name, SUM(price), RANK() OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "rank" FROM kylin_sales GROUP BY part_dt, lstg_format_name;
DENSE_RANKβ
Example:
SELECT part_dt, lstg_format_name, SUM(price), DENSE_RANK() OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "dense_rank" FROM kylin_sales GROUP BY part_dt, lstg_format_name;
FIRST_VALUEβ
Example:
SELECT part_dt, lstg_format_name, SUM(price) AS gmv, FIRST_VALUE(SUM(price)) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "first" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;
LAST_VALUEβ
Example:
SELECT part_dt, lstg_format_name, SUM(price) AS gmv, LAST_VALUE(SUM(price)) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "current" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;
LAGβ
Example:
SELECT part_dt, lstg_format_name, SUM(price) AS gmv, LAG(SUM(price), 1, 0.0) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "prev" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;
LEADβ
Example:
SELECT part_dt, lstg_format_name, SUM(price) AS gmv, LEAD(SUM(price), 1, 0.0) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "next" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;
NTILEβ
Example:
SELECT part_dt, lstg_format_name, SUM(price) AS gmv, NTILE(4) OVER (PARTITION BY lstg_format_name ORDER BY part_dt) AS "quarter" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;
CASE WHENβ
Example:
SELECT part_dt, lstg_format_name, SUM(price) AS gmv, (CASE LAG(SUM(price), 1, 0.0) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) WHEN 0.0 THEN 0 ELSE SUM(price)/LAG(SUM(price), 1, 0.0) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) END) AS "prev" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;
CASTβ
The keyword RANGE
, INTERVAL
to specify a range. PRECEDING
means the first few days (second/minute/hour/month/year). FOLLOWING
means the next few days (second/minute/hour/month/year).
Example:
SELECT part_dt, lstg_format_name, SUM(price) AS gmv, FIRST_VALUE(SUM(price)) OVER (PARTITION BY lstg_format_name ORDER BY CAST(part_dt AS timestamp) RANGE INTERVAL '3' DAY PRECEDING) AS "prev 3 days", LAST_VALUE(SUM(price)) OVER (PARTITION BY lstg_format_name ORDER BY CAST(part_dt AS timestamp) RANGE INTERVAL '3' DAY FOLLOWING) AS "next 3 days" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;
SUBSTRINGβ
Example:
SELECT SUBSTRING(lstg_format_name, 1) FROM kylin_sales;
COALESCEβ
Example:
SELECT COALESCE(lstg_format_name, '888888888888') FROM kylin_sales;
STDDEV_SUMβ
Example: The first query is shortcut of the second query. stddev_sum
is a UDAF which introduced in KYLIN-3361.
select A, stddev_sum(sampling_dim, m)
from T
group by A
select A, stddev(SUM_M)
from (
select A, sampling_dim, sum(m) as SUM_M
from T
group by A, sampling_dim
) a
group by A
INTERSECT_COUNTβ
INTERSECT_COUNT is used to calculate the retention rate. The measure to be calculated have defined precisely count distinct measure. Example 1: Refer to intersect_count
select city, version,
intersect_count(uuid, dt, array['20161014']) as first_day,
intersect_count(uuid, dt, array['20161015']) as second_day,
intersect_count(uuid, dt, array['20161016']) as third_day,
intersect_count(uuid, dt, array['20161014', '20161015']) as retention_oneday,
intersect_count(uuid, dt, array['20161014', '20161015', '20161016']) as retention_twoday
from visit_log
where dt in ('2016104', '20161015', '20161016')
group by city, version
Example 1: Refer to KYLIN-4314
select city, version,
intersect_count(uuid, dt, array['20161014']) as first_day,
intersect_count(uuid, dt, array['20161015']) as second_day,
intersect_count(uuid, dt, array['20161016']) as third_day,
intersect_count(uuid, dt, array['20161014', '20161015']) as retention_oneday,
intersect_count(uuid, dt, array['20161014|20161015', '20161016']) as retention_twoday
from visit_log
where dt in ('2016104', '20161015', '20161016')
group by city, version
INTERSECT_VALUEβ
INTERSECT_COUNT returns the bitmap details of the retained value. The measure to be calculated have defined precisely count distinct measure. ExampleοΌ
select city, version,
intersect_value(uuid, dt, array['20161014']) as first_day,
intersect_value(uuid, dt, array['20161015']) as second_day,
intersect_value(uuid, dt, array['20161016']) as third_day,
intersect_value(uuid, dt, array['20161014', '20161015']) as retention_oneday,
intersect_value(uuid, dt, array['20161014|20161015', '20161016']) as retention_twoday
from visit_log
where dt in ('2016104', '20161015', '20161016')
group by city, version
DATA TYPEβ
| ---------- | ---------- | ---------- | ---------- | -------------------- | | ANY | CHAR | VARCHAR | STRING | BOOLEAN | | BYTE | BINARY | INT | SHORT | LONG | | INTEGER | TINYINT | SMALLINT | BIGINT | TIMESTAMP | | FLOAT | REAL | DOUBLE | DECIMAL | DATETIME | | NUMERIC | DATE | TIME | | |