Available since Apache Kylin v2.6.0

Data source SDK

Since v2.6.0 Apache Kylin provides a new data source framework Data source SDK, which provides APIs to help developers handle dialect differences and easily implement a new data source.

How to develop

Configuration to implement a new data source

Data source SDK provides a conversion framework and has pre-defined a configuration file default.xml for ansi sql dialect.

Developers do not need coding, what they should do is just create a new configuration file {dialect}.xml for the new data source dialect.

Structure of the configuration:

  • Root node:
<DATASOURCE_DEF NAME="kylin" ID="mysql" DIALECT="mysql"/>

The value of ID is normally the same with configuration file.
The value of DIALECT is defined mainly for quote string for database identifier.
For example Mysql use ``, Microsoft sql server use [].
Mapping of Kylin DIALECT and Apache Calcite Dialect as belows:

Dialect in Kylin Dialect in Apache Calcite
default SqlDialect.CALCITE
calcite SqlDialect.CALCITE
greenplum SqlDialect.DatabaseProduct.POSTGRESQL
postgresql SqlDialect.DatabaseProduct.POSTGRESQL
mysql SqlDialect.DatabaseProduct.MYSQL
sql.keyword-default-uppercase whether <default> should be transform to uppercase
mssql SqlDialect.DatabaseProduct.MSSQL
oracle SqlDialect.DatabaseProduct.ORACLE
vertica SqlDialect.DatabaseProduct.VERTICA
redshift SqlDialect.DatabaseProduct.REDSHIFT
hive SqlDialect.DatabaseProduct.HIVE
h2 SqlDialect.DatabaseProduct.H2
unkown SqlDialect.DUMMY
  • Property node:
    Define the properties of the dialect.
Property Description
sql.default-converted-enabled whether enable convert
sql.allow-no-offset whether allow no offset
sql.allow-fetch-no-rows whether allow fetch 0 rows
sql.allow-no-orderby-with-fetch whether allow fetch without orderby
sql.keyword-default-escape whether <default> is keyword
sql.keyword-default-uppercase whether <default> should be transform to uppercase
sql.paging-type paging type like LIMIT_OFFSET, FETCH_NEXT, ROWNUM
sql.case-sensitive whether identifier is case sensitive
metadata.enable-cache whether enable cache for `sql.case-sensitive` is true
sql.enable-quote-all-identifiers whether enable quote
transaction.isolation-level transaction isolation level for sqoop
  • Function node:
    Developers can define the functions implementation in target data source dialect.
    For example, we want to implement Greenplum as data source, but Greenplum does not support function such as TIMESTAMPDIFF, so we can define in greenplum.xml
<FUNCTION_DEF ID="64" EXPRESSION="(CAST($1 AS DATE) - CAST($0 AS DATE))"/>

contrast with the configuration in default.xml

<FUNCTION_DEF ID="64" EXPRESSION="TIMESTAMPDIFF(day, $0, $1)"/>

Data source SDK provides conversion functions from default to target dialect with same function id.

  • Type node:
    Developers can define the types implementation in target data source dialect.
    Also take Greenplum as example, Greenplum support BIGINT instead of LONG, so we can define in greenplum.xml
<TYPE_DEF ID="Long" EXPRESSION="BIGINT"/>

contrast with the configuration in default.xml

<TYPE_DEF ID="Long" EXPRESSION="LONG"/>

Data source SDK provides conversion types from default to target dialect with same type id.

Adaptor

Adaptor provides a list of API like get metadata and data from data source.
Data source SDK provides a default implementation,developers can create a new class to extends it and have their own implementation.

org.apache.kylin.sdk.datasource.adaptor.DefaultAdaptor

Adaptor also reserves a function fixSql(String sql).
After the conversion with the conversion framework, if the sql still have some problems to adapt the target dialect, developers can implement the function to fix sql finally.

How to enable data source for Kylin

Some new configurations:

kylin.query.pushdown.runner-class-name=org.apache.kylin.query.pushdown.PushdownRunnerSDKImpl
kylin.source.default=16
kylin.source.jdbc.dialect={Dialect}
kylin.source.jdbc.adaptor={Class name of Adaptor}
kylin.source.jdbc.user={JDBC Connection Username}
kylin.source.jdbc.pass={JDBC Connection Password}
kylin.source.jdbc.connection-url={JDBC Connection String}
kylin.source.jdbc.driver={JDBC Driver Class Name}

Take mysql as an example:

kylin.query.pushdown.runner-class-name=org.apache.kylin.query.pushdown.PushdownRunnerSDKImpl
kylin.source.default=16
kylin.source.jdbc.dialect=mysql
kylin.source.jdbc.adaptor=org.apache.kylin.sdk.datasource.adaptor.MysqlAdaptor
kylin.source.jdbc.user={MYSQL_USERNAME}
kylin.source.jdbc.pass={MYSQL_PASSWORD}
kylin.source.jdbc.connection-url=jdbc:mysql://{HOST_URL}:3306/{DATABASE_NAME}
kylin.source.jdbc.driver=com.mysql.jdbc.Driver

Put the configuration file {dialect}.xml under directory $KYLIN_HOME/conf/datasource.
Create jar file for the new Adaptor, and put under directory $KYLIN_HOME/ext.

Other configurations are identical with the former jdbc connection, please refer to setup_jdbc_datasource.