Deploy kylin on AWS EC2 without hadoop
Compared with Kylin 3.x, Kylin 4.0 implements a new Spark build engine and parquet storage, making it possible for Kylin to deploy without Hadoop environment. Compared with deploying Kylin 3.x on AWS EMR, deploying kylin4 directly on AWS EC2 instances has the following advantages:
- Cost saving. Compared with AWS EMR node, AWS EC2 node has lower cost.
- More flexible. On the EC2 node, users can more independently select the services and components they need for installation and deployment.
- Remove Hadoop dependency. Hadoop ecosystem is heavy and needs to be maintained at a certain labor cost. Remove hadoop can be closer to the cloud-native.
After realizing the feature of supporting build and query in Spark Standalone mode, we tried to deploy Kylin 4.0 without Hadoop on the EC2 instance of AWS, and successfully built the cube and query.
Environment preparation
- Apply for AWS EC2 Linux instances as required
- Create Amazon RDS for MySQL as kylin and hive metabases
- S3 as kylin's storage
Component version information
The component version information provided here is that we selected during the test. If users need to use other versions for deployment, you can replace them by yourself and ensure the compatibility between component versions.
- JDK 1.8
- Hive 2.3.9
- Zookeeper 3.4.13
- Kylin 4.0 for spark3
- Spark 3.1.1
- Hadoop 3.2.0(No startup required)
Deployment process
1 Configure environment variables
-
Modify profile
vim /etc/profile
# Add the following at the end of the profile file
export JAVA_HOME=/usr/local/java/jdk1.8.0_291
export JRE_HOME=${JAVA_HOME}/jre
export HADOOP_HOME=/etc/hadoop/hadoop-3.2.0
export HIVE_HOME=/etc/hadoop/hive
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=$HIVE_HOME/bin:$HIVE_HOME/conf:${HADOOP_HOME}/bin:${JAVA_HOME}/bin:$PATH
# Execute after saving the contents of the above file
source /etc/profile