Apache Kylin | Develop JDBC Data Source

Available since Apache Kylin v2.6.0

Data source SDK

Since v2.6.0 Apache Kylin provides a new data source framework Data source SDK, which provides APIs to help developers handle dialect differences and easily implement a new data source.

How to develop

Configuration to implement a new data source

Data source SDK provides a conversion framework and has pre-defined a configuration file default.xml for ansi sql dialect.

Developers do not need coding, what they should do is just create a new configuration file {dialect}.xml for the new data source dialect.

Structure of the configuration:

Root node:

<DATASOURCE_DEF NAME="kylin" ID="mysql" DIALECT="mysql"/>

The value of ID is normally the same with configuration file.
The value of DIALECT is defined mainly for quote string for database identifier.
For example Mysql use ``, Microsoft sql server use [].
Mapping of Kylin DIALECT and Apache Calcite Dialect as belows:

Dialect in Kylin	Dialect in Apache Calcite
default	SqlDialect.CALCITE
calcite	SqlDialect.CALCITE
greenplum	SqlDialect.DatabaseProduct.POSTGRESQL
postgresql	SqlDialect.DatabaseProduct.POSTGRESQL
mysql	SqlDialect.DatabaseProduct.MYSQL
sql.keyword-default-uppercase	whether <default> should be transform to uppercase
mssql	SqlDialect.DatabaseProduct.MSSQL
oracle	SqlDialect.DatabaseProduct.ORACLE
vertica	SqlDialect.DatabaseProduct.VERTICA
redshift	SqlDialect.DatabaseProduct.REDSHIFT
hive	SqlDialect.DatabaseProduct.HIVE
h2	SqlDialect.DatabaseProduct.H2
unkown	SqlDialect.DUMMY

Property node:
Define the properties of the dialect.

Property	Description
sql.default-converted-enabled	whether enable convert
sql.allow-no-offset	whether allow no offset
sql.allow-fetch-no-rows	whether allow fetch 0 rows
sql.allow-no-orderby-with-fetch	whether allow fetch without orderby
sql.keyword-default-escape	whether <default> is keyword
sql.keyword-default-uppercase	whether <default> should be transform to uppercase
sql.paging-type	paging type like LIMIT_OFFSET, FETCH_NEXT, ROWNUM
sql.case-sensitive	whether identifier is case sensitive
metadata.enable-cache	whether enable cache for `sql.case-sensitive` is true
sql.enable-quote-all-identifiers	whether enable quote
transaction.isolation-level	transaction isolation level for sqoop

Function node:
Developers can define the functions implementation in target data source dialect.
For example, we want to implement Greenplum as data source, but Greenplum does not support function such as TIMESTAMPDIFF, so we can define in greenplum.xml

<FUNCTION_DEF ID="64" EXPRESSION="(CAST($1 AS DATE) - CAST($0 AS DATE))"/>

contrast with the configuration in default.xml

<FUNCTION_DEF ID="64" EXPRESSION="TIMESTAMPDIFF(day, $0, $1)"/>

Data source SDK provides conversion functions from default to target dialect with same function id.

Type node:
Developers can define the types implementation in target data source dialect.
Also take Greenplum as example, Greenplum support BIGINT instead of LONG, so we can define in greenplum.xml

<TYPE_DEF ID="Long" EXPRESSION="BIGINT"/>

contrast with the configuration in default.xml

<TYPE_DEF ID="Long" EXPRESSION="LONG"/>

Data source SDK provides conversion types from default to target dialect with same type id.

Adaptor

Adaptor provides a list of API like get metadata and data from data source.
Data source SDK provides a default implementation，developers can create a new class to extends it and have their own implementation.

org.apache.kylin.sdk.datasource.adaptor.DefaultAdaptor

Adaptor also reserves a function fixSql(String sql).
After the conversion with the conversion framework, if the sql still have some problems to adapt the target dialect, developers can implement the function to fix sql finally.

How to enable data source for Kylin

Some new configurations:

kylin.query.pushdown.runner-class-name=org.apache.kylin.query.pushdown.PushdownRunnerSDKImpl
kylin.source.default=16
kylin.source.jdbc.dialect={Dialect}
kylin.source.jdbc.adaptor={Class name of Adaptor}
kylin.source.jdbc.user={JDBC Connection Username}
kylin.source.jdbc.pass={JDBC Connection Password}
kylin.source.jdbc.connection-url={JDBC Connection String}
kylin.source.jdbc.driver={JDBC Driver Class Name}

Take mysql as an example:

kylin.query.pushdown.runner-class-name=org.apache.kylin.query.pushdown.PushdownRunnerSDKImpl
kylin.source.default=16
kylin.source.jdbc.dialect=mysql
kylin.source.jdbc.adaptor=org.apache.kylin.sdk.datasource.adaptor.MysqlAdaptor
kylin.source.jdbc.user={MYSQL_USERNAME}
kylin.source.jdbc.pass={MYSQL_PASSWORD}
kylin.source.jdbc.connection-url=jdbc:mysql://{HOST_URL}:3306/{DATABASE_NAME}
kylin.source.jdbc.driver=com.mysql.jdbc.Driver

Put the configuration file {dialect}.xml under directory $KYLIN_HOME/conf/datasource.
Create jar file for the new Adaptor, and put under directory $KYLIN_HOME/ext.

Other configurations are identical with the former jdbc connection, please refer to setup_jdbc_datasource.