Prerequisite
To ensure system performance and stability, we recommend you run Kylin on a dedicated Hadoop cluster.
Prior to installing Kylin, please check the following prerequisites are met.
- Environment
- Recommended Resource and Configuration
Supported Hadoop Distributions
The following Hadoop distributions are verified to run on Kylin.
- Apache Hadoop 3.2.1
Kylin requires some components, please make sure each server has the following components.
- Hive
- HDFS
- Yarn
- ZooKeeper
Prepare Environment
First, make sure you allocate sufficient resources for the environment. Please refer to Prerequisites for detailed resource requirements for Kylin. Moreover, please ensure that HDFS
, YARN
, Hive
, ZooKeeper
and other components are in normal state without any warning information.
Additional configuration required for Apache Hadoop version
Add the following two configurations in $KYLIN_HOME/conf/kylin.properties
:
kylin.env.apache-hadoop-conf-dir
Hadoop conf directory in Hadoop environmentkylin.env.apache-hive-conf-dir
Hive conf directory in Hadoop environment
Jar package required by Apache Hadoop version
In Apache Hadoop 3.2.1, you also need to prepare the MySQL JDBC driver in the operating environment of Kylin.
Download MySQL 8.0 JDBC driver:https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.30/mysql-connector-java-8.0.30.jar.
Please place the JDBC driver in the $KYLIN_HOME/lib/ext
directory.
Java Environment
Kylin requires:
- Requires your environment's default JDK version is 8 (JDK 1.8_162 or above small version)
java -version
You can use the following command to check the JDK version of your existing environment, for example, the following figure shows JDK 8
Account Authority
The Linux account running Kylin must have the required access permissions to the cluster. These permissions include:
- Read/Write permission of HDFS
- Create/Read/Write permission of Hive table
Verify the user has access to the Hadoop cluster with account KyAdmin
. Test using the steps below:
-
Verify the user has HDFS read and write permissions
Assuming the HDFS storage path for model data is
/kylin
, set it inconf/kylin.properties
as:kylin.env.hdfs-working-dir=/kylin
The storage folder must be created and granted with permissions. You may have to switch to HDFS administrator (usually the
hdfs
user), to do this:su hdfs
hdfs dfs -mkdir /kylin
hdfs dfs -chown KyAdmin /kylin
hdfs dfs -mkdir /user/KyAdmin
hdfs dfs -chown KyAdmin /user/KyAdminVerify the
KyAdmin
user has read and write permissionshdfs dfs -put <any_file> /kylin
hdfs dfs -put <any_file> /user/KyAdmin -
Verify the
KyAdmin
user has Hive read and write permissionsLet's say you want to store a Hive table
t1
in Hive databasekylinDB
, Thet1
table contains two fieldsid, name
.Then verify the Hive permissions:
#hive
hive> show databases;
hive> use kylinDB;
hive> show tables;
hive> insert into t1 values(1, "kylin");
hive> select * from t1;
Prepare Metadata DB
A configured metastore is required for this product.
We recommend using PostgreSQL 10.7 as the metastore, which is provided in our package. Please refer to Use PostgreSQL as Metastore (Default) for installation steps and details.
If you want to use your own PostgreSQL database, the supported versions are below:
- PostgreSQL 9.1 or above
You can also choose to use MySQL but we currently don't provide a MySQL installation package or JDBC driver. Therefore, you need to finish all the prerequisites before setting up. Please refer to Use MySQL as Metastore for installation steps and details. The supported MySQL database versions are below:
- MySQL 5.1-5.7
- MySQL 5.7 (recommended)
Prepare Zookeeper
The following steps can be used to quickly verify the connectivity between ZooKeeper and Kylin after Kerberos is enabled.
-
Find the ZooKeeper working directory on the node where the ZooKeeper Client is deployed
-
Add or modify the Client section to the
conf/jaas.conf
file:Client {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="/path/to/keytab_assigned_to_kylin"
storeKey=true
useTicketCache=false
principal="principal_assigned_to_kylin";
}; -
export JVMFLAGS="-Djava.security.auth.login.config=/path/to/jaas.conf"
-
bin/zkCli.sh -server ${kylin.env.zookeeper-connect-string}
-
Verify that the ZooKeeper node can be viewed normally, for example:
ls /
-
Clean up the new Client section in step 2 and the environment variables
unset JVMFLAGS
declared in step 3
If you download ZooKeeper from the non-official website, you can consult the operation and maintenance personnel before performing the above operations.
Network Port Requirements
Kylin needs to communicate with different components. The following are the ports that need to be opened to Kylin. This table only includes the default configuration of the Hadoop environment, and does not include the configuration differences between Hadoop platforms.
Component | Port | Function | Required |
---|---|---|---|
SSH | 22 | SSH to connect to the port of the virtual machine where Kylin is located | Y |
Kylin | 7070 | Kylin access port | Y |
Kylin | 7443 | Kylin HTTPS access port | N |
HDFS | 8020 | HDFS receives client connection RPC port | Y |
HDFS | 50010 | Access HDFS DataNode, data transmission port | Y |
Hive | 10000 | HiveServer2 access port | N |
Hive | 9083 | Hive Metastore access port | Y |
Zookeeper | 2181 | Zookeeper access port | Y |
Yarn | 8088 | Yarn Web UI access port | Y |
Yarn | 8090 | Yarn Web UI HTTPS access port | N |
Yarn | 8050 / 8032 | Yarn ResourceManager communication port | Y |
Spark | 4041 | Kylin query engine Web UI default port | Y |
Spark | 18080 | Spark History Server port | N |
Spark | (1024, 65535] | The ports occupied by Spark Driver and Executor are random | Y |
Influxdb | 8086 | Influxdb HTTP port | N |
Influxdb | 8088 | Influxdb RPC port | N |
PostgreSQL | 5432 | PostgreSQL access port | Y |
MySQL | 3306 | MySQL access port | Y |
Hadoop Cluster Resource Allocation
To ensure Kylin works efficiently, please ensure the Hadoop cluster configurations satisfy the following conditions:
yarn.nodemanager.resource.memory-mb
larger than 8192 MByarn.scheduler.maximum-allocation-mb
larger than 4096 MByarn.scheduler.maximum-allocation-vcores
larger than 5
If you need to run Kylin in a sandbox or other virtual machine environment, please make sure the virtual machine environment has the following resources:
-
No less than 4 processors
-
Memory is no less than 10 GB
-
The value of the configuration item
yarn.nodemanager.resource.cpu-vcores
is no less than 8
Recommended Hardware Configuration
We recommend the following hardware configuration to install Kylin:
- 16 vCore, 64 GB memory
- At least 500GB disk
- For network port requirements, please refer to the Network Port Requirements chapter.
Recommended Linux Distribution
We recommend using the following version of the Linux operating system:
- Ubuntu 18.04 and above (recommend LTS version)
- Red Hat Enterprise Linux 6.4+ and above
- CentOS 6.4+ and above
Recommended Client Configuration
- Operating System: macOS / Windows 7 and above
- RAM: 8G or above
- Browser version:
- Chrome 45 or above
- Internet Explorer 11 or above