Version: 3.1.3

SQL Reference

Apache Kylin relies on Apache Calcite to parse and optimize the SQL statements. As an OLAP engine, Kylin supports SELECT statements, while doesn't support others like INSERT, UPDATE and DELETE operations in SQL, so Kylin's SQL grammer is a subset of Apache Calcite. This page lists the SQL grammar, the functions and the basic data types that Kylin supports. You can also check Calcite SQL reference for more detailed info.

Grammar

QUERY SYNTAX SELECT 　STATEMENT 　EXPRESSION SUBQUERY JOIN 　INNER JOIN 　LEFT JOIN UNION UNION ALL

Function

COUNT 　COUNT(COLUMN) 　COUNT(*) COUNT_DISTINCT MAX MIN PERCENTILE SUM TOP_N WINDOW 　ROW_NUMBER 　AVG 　RANK 　DENSE_RANK 　FIRST_VALUE 　LAST_VALUE 　LAG 　LEAD 　NTILE 　CASE WHEN 　CAST SUSTRING COALESCE STDDEV_SUM INTERSECT_COUNT INTERSECT_VALUE

Data Type

DATA TYPE

Sql Execution Plan

You can using explain plan for to displays execution plan, such as:

explain plan for select count(*) from KYLIN_SALES

The front-end displays the string of a one line execution plan. It is better to export the execution plan through the result export to view it.

QUERY SYNTAX

statement:
|  query

query:
      values
  |  WITH withItem [ , withItem ]* query
  |   {
          select
      |  selectWithoutFrom
      |  query UNION [ ALL | DISTINCT ] query
      |  query INTERSECT [ ALL | DISTINCT ] query
      }
      [ ORDER BY orderItem [, orderItem ]* ]
      [ LIMIT { count | ALL } ]
      [ OFFSET start { ROW | ROWS } ]
      [ FETCH { FIRST | NEXT } [ count ] { ROW| ROWS } ]

withItem:
      name
      ['(' column [, column ]* ')' ]
      AS '(' query ')'

orderItem:
      expression [ ASC | DESC ]［ NULLS FIRST |NULLS LAST ］

select:
      SELECT [ ALL | DISTINCT]
          { * | projectItem [, projectItem ]* }
      FROM tableExpression
      [ WHERE booleanExpression ]
      [ GROUP BY { groupItem [, groupItem ]* }]
      [ HAVING booleanExpression ]
      [ WINDOW windowName AS windowSpec [,windowName AS windowSpec ]* ]

selectWithoutFrom:
      SELECT [ ALL | DISTINCT ]
          { * | projectItem [, projectItem ]* }

projectItem:
      expression [ [ AS ] columnAlias ]
  |  tableAlias . *

tableExpression:
      tableReference [, tableReference ]*
  |  tableExpression [ NATURAL ]［( LEFT | RIGHT | FULL ) [ OUTER ] ］ JOIN tableExpression [ joinCondition ]

joinCondition:
      ON booleanExpression
  |  USING '(' column [, column ]* ')'

tableReference:
      tablePrimary
      [ matchRecognize ]
      [ [ AS ] alias [ '(' columnAlias [,columnAlias ]* ')' ] ]

tablePrimary:
      [ [ catalogName . ] schemaName . ] tableName
      '(' TABLE [ [ catalogName . ] schemaName. ] tableName ')'
  |   [ LATERAL ] '(' query ')'
  |  UNNEST '(' expression ')' [ WITH ORDINALITY ]
  |   [ LATERAL ] TABLE '(' [ SPECIFIC ] functionName '(' expression [, expression ]*')' ')'

values:
      VALUES expression [, expression ]*

groupItem:
      expression
  |   '('')'
  |   '('expression [, expression ]* ')'
  |  GROUPING SETS '(' groupItem [, groupItem ]* ')'

windowRef:
      windowName
  |  windowSpec

windowSpec:
      [windowName ]
      '('
      [ ORDER BY orderItem [, orderItem ]* ]
      [ PARTITION BY expression [, expression]* ]
      [
          RANGE numericOrIntervalExpression { PRECEDING | FOLLOWING }
      |  ROWS numericExpression { PRECEDING | FOLLOWING }
      ]
    ')'
 

SELECT

STATEMENT

SELECT chooses the data from the table. COUNT is used for quantitative statistics. DISTINCT filters out duplicate results. AS is used to alias tables or columns. FROM identifies the table being queried. JOIN is used to connect two tables to get the desired data. WHERE is used to specify the standard of selection. LIKE is used to search for a specified pattern in a column in a WHERE clause. BETWEEN ... AND is used to select a range of data between two values. AND and OR is used to filter records based on more than one condition. GROUP BY groups the result by the given expression(s). HAVING filters rows after grouping. ORDER BY sorts the result based on the given expression, usually uses with TOPN function. LIMIT limits the number of rows returned by the query.

Example:

SELECT COUNT(*) FROM kylin_sales;

SELECT COUNT(DISTINCT seller_id) FROM kylin_sales;

SELECT seller_id, COUNT(1) FROM kylin_sales GROUP BY seller_id;

SELECT lstg_format_name, SUM(price) AS gmv, COUNT(DISTINCT seller_id) AS dist_seller FROM kylin_sales WHERE lstg_format_name='FP-GTC' GROUP BY lstg_format_name HAVING COUNT(DISTINCT seller_id) > 50;

SELECT lstg_format_name FROM kylin_sales INNER JOIN kylin_cal_dt AS kylin_cal_dt ON kylin_sales.part_dt = kylin_cal_dt.cal_dt WHERE NOT(lstg_format_name NOT LIKE '%ab%') GROUP BY lstg_format_name;

SELECT kylin_cal_dt.cal_dt FROM kylin_sales INNER JOIN kylin_cal_dt AS kylin_cal_dt ON kylin_sales.part_dt = kylin_cal_dt.cal_dt WHERE kylin_cal_dt.cal_dt BETWEEN DATE '2013-01-01' AND DATE '2013-06-04';

SELECT kylin_sales.lstg_format_name, SUM(kylin_sales.price) AS gmv, COUNT(*) AS trans_cnt FROM kylin_sales WHERE kylin_sales.lstg_format_name IS NULL GROUP BY kylin_sales.lstg_format_name HAVING SUM(price)>5000 AND COUNT(*)>72; 

SELECT kylin_sales.lstg_format_name, SUM(kylin_sales.price) AS gmv, COUNT(*) AS trans_cnt FROM kylin_sales WHERE kylin_sales.lstg_format_name IS NOT NULL GROUP BY kylin_sales.lstg_format_name HAVING SUM(price)>5000 OR COUNT(*)>20;

SELECT lstg_format_name, SUM(price) AS gmv, COUNT(1) AS trans_cnt FROM kylin_sales WHERE lstg_format_name='FP-GTC' GROUP BY lstg_format_name ORDER BY lstg_format_name LIMIT 10;

EXPRESSION

An expression in a SELECT statement. All columns in a table may be selected using *. Example:

*
ID AS VALUE
VALUE + 1

SUBQUERY

Example:

SELECT cal_dt ,sum(price) AS sum_price FROM (SELECT kylin_cal_dt.cal_dt, kylin_sales.price FROM kylin_sales INNER JOIN kylin_cal_dt AS kylin_cal_dt ON kylin_sales.part_dt = kylin_cal_dt.cal_dt INNER JOIN kylin_category_groupings ON kylin_sales.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND kylin_sales.lstg_site_id = kylin_category_groupings.site_id) t GROUP BY cal_dt;

JOIN

INNER JOIN

The INNER JOIN keyword returns rows when there is at least one match in the table.

Example:

SELECT kylin_cal_dt.cal_dt, kylin_sales.price FROM kylin_sales INNER JOIN kylin_cal_dt AS kylin_cal_dt ON kylin_sales.part_dt = kylin_cal_dt.cal_dt;

LEFT JOIN

The LEFT JOIN keyword returns all rows from the left table (kylin_sales), even if there are no matching rows in the right table (kylin_category_groupings).

Example:

SELECT seller_id FROM kylin_sales LEFT JOIN kylin_category_groupings AS kylin_category_groupings ON kylin_sales.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND kylin_sales.lstg_site_id = kylin_category_groupings.site_id WHERE lstg_format_name='FP-GTC' GROUP BY seller_id LIMIT 20;

UNION

The UNION operator is used to combine the result sets of two or more SELECT statements. Note that the SELECT statement inside UNION must have the same number of columns. Columns must also have similar data types. At the same time, the order of the columns in each SELECT statement must be the same. By default, the UNION operator picks a different value. If you allow duplicate values, use UNION ALL.

Example:

SELECT SUM(price) AS x, seller_id, COUNT(*) AS y FROM kylin_sales WHERE part_dt < DATE '2012-08-01' GROUP BY seller_id UNION (SELECT SUM(price) AS x, seller_id, COUNT(*) AS y FROM kylin_sales WHERE part_dt > DATE '2012-12-01' GROUP BY seller_id);

UNION ALL

The UNION ALL command is almost equivalent to the UNION command, but the UNION ALL command lists all values.

Example:

SELECT COUNT(trans_id) AS trans_id FROM kylin_sales AS test_a WHERE trans_id <> 1 UNION ALL SELECT COUNT(trans_id) AS trans_id FROM kylin_sales AS test_b;

COUNT

Returns the number of rows matching the specified criteria.

COUNT(COLUMN)

Example:

SELECT COUNT(seller_id) FROM kylin_sales;

COUNT(*)

Example:

SELECT COUNT(*) FROM kylin_sales;

COUNT_DISTINCT

Example:

SELECT COUNT(DISTINCT seller_id) AS DIST_SELLER FROM kylin_sales;

MAX

Returns the maximum value in a column. NULL values are not included in the calculation. Example:

SELECT MAX(lstg_site_id) FROM kylin_sales;

MIN

Returns the minimum value in a column. NULL values are not included in the calculation. Example:

SELECT MIN(lstg_site_id) FROM kylin_sales;

PERCENTILE

Example:

SELECT seller_id, PERCENTILE(price, 0.5) FROM kylin_sales GROUP BY seller_id;

SELECT seller_id, PERCENTILE_APPROX(price, 0.5) FROM kylin_sales GROUP BY seller_id;

SUM

Returns the total number of numeric columns. Example:

SELECT SUM(price) FROM kylin_sales;

TOP_N

Example:

SELECT SUM(price) AS gmv
 FROM kylin_sales 
INNER JOIN kylin_cal_dt AS kylin_cal_dt
 ON kylin_sales.part_dt = kylin_cal_dt.cal_dt
 INNER JOIN kylin_category_groupings
 ON kylin_sales.leaf_categ_id = kylin_category_groupings.leaf_categ_id AND kylin_sales.lstg_site_id = kylin_category_groupings.site_id
 WHERE kylin_cal_dt.cal_dt between DATE '2013-09-01' AND DATE '2013-10-01' AND (lstg_format_name='FP-GTC' OR 'a' = 'b')
 GROUP BY kylin_cal_dt.cal_dt;
 
SELECT kylin_sales.part_dt, seller_id
FROM kylin_sales
INNER JOIN kylin_cal_dt AS kylin_cal_dt
ON kylin_sales.part_dt = kylin_cal_dt.cal_dt
INNER JOIN kylin_category_groupings
ON kylin_sales.leaf_categ_id = kylin_category_groupings.leaf_categ_id
AND kylin_sales.lstg_site_id = kylin_category_groupings.site_id 
GROUP BY 
kylin_sales.part_dt, kylin_sales.seller_id ORDER BY SUM(kylin_sales.price) DESC LIMIT 20;

WINDOW

The WINDOW function performs the calculation on a set of table rows associated with the current row. NOTE: OVER clause is necessary for window functions.

ROW_NUMBER

Example:

SELECT lstg_format_name, SUM(price) AS gmv, ROW_NUMBER() OVER() FROM kylin_sales GROUP BY lstg_format_name;

AVG

Returns the average of the numeric columns. NULL values are not included in the calculation. Example:

SELECT lstg_format_name, AVG(SUM(price)) OVER(PARTITION BY lstg_format_name) FROM kylin_sales GROUP BY part_dt, lstg_format_name;

RANK

Example:

SELECT part_dt, lstg_format_name, SUM(price), RANK() OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "rank" FROM kylin_sales GROUP BY part_dt, lstg_format_name;

DENSE_RANK

Example:

SELECT part_dt, lstg_format_name, SUM(price), DENSE_RANK() OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "dense_rank" FROM kylin_sales GROUP BY part_dt, lstg_format_name;

FIRST_VALUE

Example:

SELECT part_dt, lstg_format_name, SUM(price) AS gmv, FIRST_VALUE(SUM(price)) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "first" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;

LAST_VALUE

Example:

SELECT part_dt, lstg_format_name, SUM(price) AS gmv, LAST_VALUE(SUM(price)) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "current" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;

LAG

Example:

SELECT part_dt, lstg_format_name, SUM(price) AS gmv, LAG(SUM(price), 1, 0.0) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "prev" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;

LEAD

Example:

SELECT part_dt, lstg_format_name, SUM(price) AS gmv, LEAD(SUM(price), 1, 0.0) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) AS "next" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;

NTILE

Example:

SELECT part_dt, lstg_format_name, SUM(price) AS gmv, NTILE(4) OVER (PARTITION BY lstg_format_name ORDER BY part_dt) AS "quarter" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;

CASE WHEN

Example:

SELECT part_dt, lstg_format_name, SUM(price) AS gmv, (CASE LAG(SUM(price), 1, 0.0) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) WHEN 0.0 THEN 0 ELSE SUM(price)/LAG(SUM(price), 1, 0.0) OVER(PARTITION BY lstg_format_name ORDER BY part_dt) END) AS "prev" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;

CAST

The keyword RANGE, INTERVAL to specify a range. PRECEDING means the first few days (second/minute/hour/month/year). FOLLOWING means the next few days (second/minute/hour/month/year). Example:

SELECT part_dt, lstg_format_name, SUM(price) AS gmv, FIRST_VALUE(SUM(price)) OVER (PARTITION BY lstg_format_name ORDER BY CAST(part_dt AS timestamp) RANGE INTERVAL '3' DAY PRECEDING) AS "prev 3 days", LAST_VALUE(SUM(price)) OVER (PARTITION BY lstg_format_name ORDER BY CAST(part_dt AS timestamp) RANGE INTERVAL '3' DAY FOLLOWING) AS "next 3 days" FROM kylin_sales WHERE part_dt < '2012-02-01' GROUP BY part_dt, lstg_format_name;

SUBSTRING

Example:

SELECT SUBSTRING(lstg_format_name, 1) FROM kylin_sales;

COALESCE

Example:

SELECT COALESCE(lstg_format_name, '888888888888') FROM kylin_sales;

STDDEV_SUM

Example: The first query is shortcut of the second query. stddev_sum is a UDAF which introduced in KYLIN-3361.

select A, stddev_sum(sampling_dim, m)
from T
group by A

select A, stddev(SUM_M)
from (
      select A, sampling_dim, sum(m) as SUM_M
      from T
      group by A, sampling_dim
) a
group by A

INTERSECT_COUNT

INTERSECT_COUNT is used to calculate the retention rate. The measure to be calculated have defined precisely count distinct measure. Example 1: Refer to intersect_count

select city, version,
intersect_count(uuid, dt, array['20161014']) as first_day,
intersect_count(uuid, dt, array['20161015']) as second_day,
intersect_count(uuid, dt, array['20161016']) as third_day,
intersect_count(uuid, dt, array['20161014', '20161015']) as retention_oneday,
intersect_count(uuid, dt, array['20161014', '20161015', '20161016']) as retention_twoday
from visit_log
where dt in ('2016104', '20161015', '20161016')
group by city, version

Example 1: Refer to KYLIN-4314

select city, version,
intersect_count(uuid, dt, array['20161014']) as first_day,
intersect_count(uuid, dt, array['20161015']) as second_day,
intersect_count(uuid, dt, array['20161016']) as third_day,
intersect_count(uuid, dt, array['20161014', '20161015']) as retention_oneday,
intersect_count(uuid, dt, array['20161014|20161015', '20161016']) as retention_twoday
from visit_log
where dt in ('2016104', '20161015', '20161016')
group by city, version

INTERSECT_VALUE

INTERSECT_COUNT returns the bitmap details of the retained value. The measure to be calculated have defined precisely count distinct measure. Example：

select city, version,
intersect_value(uuid, dt, array['20161014']) as first_day,
intersect_value(uuid, dt, array['20161015']) as second_day,
intersect_value(uuid, dt, array['20161016']) as third_day,
intersect_value(uuid, dt, array['20161014', '20161015']) as retention_oneday,
intersect_value(uuid, dt, array['20161014|20161015', '20161016']) as retention_twoday
from visit_log
where dt in ('2016104', '20161015', '20161016')
group by city, version

DATA TYPE

| ---------- | ---------- | ---------- | ---------- | -------------------- | | ANY | CHAR | VARCHAR | STRING | BOOLEAN | | BYTE | BINARY | INT | SHORT | LONG | | INTEGER | TINYINT | SMALLINT | BIGINT | TIMESTAMP | | FLOAT | REAL | DOUBLE | DECIMAL | DATETIME | | NUMERIC | DATE | TIME | | |

Grammar​

Function​

Data Type​

Sql Execution Plan​

QUERY SYNTAX​

SELECT​

STATEMENT​

EXPRESSION​

SUBQUERY​

JOIN​

INNER JOIN​

LEFT JOIN​

UNION​

UNION ALL​

COUNT​

COUNT(COLUMN)​

COUNT(*)​

COUNT_DISTINCT​

MAX​

MIN​

PERCENTILE​

SUM​

TOP_N​

WINDOW​

ROW_NUMBER​

AVG​

RANK​

DENSE_RANK​

FIRST_VALUE​

LAST_VALUE​

LAG​

LEAD​

NTILE​

CASE WHEN​

CAST​

SUBSTRING​

COALESCE​

STDDEV_SUM​

INTERSECT_COUNT​

INTERSECT_VALUE​

DATA TYPE​

Grammar

Function

Data Type

Sql Execution Plan

QUERY SYNTAX

SELECT

STATEMENT

EXPRESSION

SUBQUERY

JOIN

INNER JOIN

LEFT JOIN

UNION

UNION ALL

COUNT

COUNT(COLUMN)

COUNT(*)

COUNT_DISTINCT

MAX

MIN

PERCENTILE

SUM

TOP_N

WINDOW

ROW_NUMBER

AVG

RANK

DENSE_RANK

FIRST_VALUE

LAST_VALUE

LAG

LEAD

NTILE

CASE WHEN

CAST

SUBSTRING

COALESCE

STDDEV_SUM

INTERSECT_COUNT

INTERSECT_VALUE

DATA TYPE