Available since Apache Kylin v2.6.0
Data source SDK
Since v2.6.0 Apache Kylin provides a new data source framework Data source SDK, which provides APIs to help developers handle dialect differences and easily implement a new data source.
How to develop
Configuration to implement a new data source
Data source SDK provides a conversion framework and has pre-defined a configuration file default.xml for ansi sql dialect.
Developers do not need coding, what they should do is just create a new configuration file {dialect}.xml for the new data source dialect.
Structure of the configuration:
- Root node:
<DATASOURCE_DEF NAME="kylin" ID="mysql" DIALECT="mysql"/>
The value of ID is normally the same with configuration file.
The value of DIALECT is defined mainly for quote string for database identifier.
For example Mysql use ``, Microsoft sql server use [].
Mapping of Kylin DIALECT and Apache Calcite Dialect as belows:
Dialect in Kylin | Dialect in Apache Calcite |
default | SqlDialect.CALCITE |
calcite | SqlDialect.CALCITE |
greenplum | SqlDialect.DatabaseProduct.POSTGRESQL |
postgresql | SqlDialect.DatabaseProduct.POSTGRESQL |
mysql | SqlDialect.DatabaseProduct.MYSQL |
sql.keyword-default-uppercase | whether <default> should be transform to uppercase |
mssql | SqlDialect.DatabaseProduct.MSSQL |
oracle | SqlDialect.DatabaseProduct.ORACLE |
vertica | SqlDialect.DatabaseProduct.VERTICA |
redshift | SqlDialect.DatabaseProduct.REDSHIFT |
hive | SqlDialect.DatabaseProduct.HIVE |
h2 | SqlDialect.DatabaseProduct.H2 |
unkown | SqlDialect.DUMMY |
- Property node:
Define the properties of the dialect.
Property | Description |
sql.default-converted-enabled | whether enable convert |
sql.allow-no-offset | whether allow no offset |
sql.allow-fetch-no-rows | whether allow fetch 0 rows |
sql.allow-no-orderby-with-fetch | whether allow fetch without orderby |
sql.keyword-default-escape | whether <default> is keyword |
sql.keyword-default-uppercase | whether <default> should be transform to uppercase |
sql.paging-type | paging type like LIMIT_OFFSET, FETCH_NEXT, ROWNUM |
sql.case-sensitive | whether identifier is case sensitive |
metadata.enable-cache | whether enable cache for `sql.case-sensitive` is true |
sql.enable-quote-all-identifiers | whether enable quote |
transaction.isolation-level | transaction isolation level for sqoop |
- Function node:
Developers can define the functions implementation in target data source dialect.
For example, we want to implement Greenplum as data source, but Greenplum does not support function such as TIMESTAMPDIFF, so we can define in greenplum.xml
<FUNCTION_DEF ID="64" EXPRESSION="(CAST($1 AS DATE) - CAST($0 AS DATE))"/>
contrast with the configuration in default.xml
<FUNCTION_DEF ID="64" EXPRESSION="TIMESTAMPDIFF(day, $0, $1)"/>
Data source SDK provides conversion functions from default to target dialect with same function id.
- Type node:
Developers can define the types implementation in target data source dialect.
Also take Greenplum as example, Greenplum support BIGINT instead of LONG, so we can define in greenplum.xml
<TYPE_DEF ID="Long" EXPRESSION="BIGINT"/>
contrast with the configuration in default.xml
<TYPE_DEF ID="Long" EXPRESSION="LONG"/>
Data source SDK provides conversion types from default to target dialect with same type id.
Adaptor
Adaptor provides a list of API like get metadata and data from data source.
Data source SDK provides a default implementation,developers can create a new class to extends it and have their own implementation.
org.apache.kylin.sdk.datasource.adaptor.DefaultAdaptor
Adaptor also reserves a function fixSql(String sql).
After the conversion with the conversion framework, if the sql still have some problems to adapt the target dialect, developers can implement the function to fix sql finally.
How to enable data source for Kylin
Some new configurations:
kylin.query.pushdown.runner-class-name=org.apache.kylin.query.pushdown.PushdownRunnerSDKImpl
kylin.source.default=16
kylin.source.jdbc.dialect={Dialect}
kylin.source.jdbc.adaptor={Class name of Adaptor}
kylin.source.jdbc.user={JDBC Connection Username}
kylin.source.jdbc.pass={JDBC Connection Password}
kylin.source.jdbc.connection-url={JDBC Connection String}
kylin.source.jdbc.driver={JDBC Driver Class Name}
Take mysql as an example:
kylin.query.pushdown.runner-class-name=org.apache.kylin.query.pushdown.PushdownRunnerSDKImpl
kylin.source.default=16
kylin.source.jdbc.dialect=mysql
kylin.source.jdbc.adaptor=org.apache.kylin.sdk.datasource.adaptor.MysqlAdaptor
kylin.source.jdbc.user={MYSQL_USERNAME}
kylin.source.jdbc.pass={MYSQL_PASSWORD}
kylin.source.jdbc.connection-url=jdbc:mysql://{HOST_URL}:3306/{DATABASE_NAME}
kylin.source.jdbc.driver=com.mysql.jdbc.Driver
Put the configuration file {dialect}.xml under directory $KYLIN_HOME/conf/datasource.
Create jar file for the new Adaptor, and put under directory $KYLIN_HOME/ext.
Other configurations are identical with the former jdbc connection, please refer to setup_jdbc_datasource.