Skip to main content

How to debug Kylin in IDEA with Hadoop

Background

Why debug Kylin in IDEA using docker

This article aims to introduce a simple and useful way to develop and debug Kylin for developers and provided a similar deployment to the user's real scenario.

Deployment architecture

Following is the architecture of the current deployment.

debug_in_laptop

This guide assumes you have prepared the following things:

  • A laptop with macOS installed to do development work (Windows is not verified at the moment)
  • A remote Linux server for testing and deployment purposes (if you do not prepare a remote Linux server, you will deploy Hadoop on your laptop)
  • Kaylin's source code is cloned into some directory on your laptop
For Windows Dev Machine

For Windows dev machine, setup the Kylin dev env in Windows Subsystem for Linux is the best option. Follow this guide on how to install WSL with GUI, and install both the Kylin code and your favorite IDE (but not the docker) in WSL for best performance.

Prepare IDEA and build source code

Step 1: Check Software Requirement

Please visit software_requirement, and make sure your laptop has met the requirement.

Step 2: Build source code

  • Build backend source code before your start debugging.
cd <path-to-kylin-source>
mvn clean install -DskipTests
cd kystudio
npm install

Step 3: Install IntelliJ IDEA and build the source

  1. Install the IDEA Community edition (the Ultimate edition is ok too).

  2. Import the source code into IDEA. Click the Open, and choose the directory of Kylin source code.

  3. Install the scala plugin and restart

  4. Configure SDK(JDK and Scala), and make sure you use JDK 1.8.X and Scala 2.12.X.

  5. Reload maven projects, and the scala directory will be marked as source root(in blue color).

  6. Build the projects.(make sure you have executed mvn clean package -DskipTests, otherwise some source code is not generated by maven javacc plugin)

Step 4: Prepare IDEA configuration

  1. Download spark and create running IDEA configuration for debug purpose.

    ./dev-support/sandbox/sandbox.sh init

    Following is the shell output. sandbox.sh init

Prepare the Hadoop Cluster

Deploy Hadoop Cluster

  1. Install latest docker desktop in your laptop
  2. [Optional] Install docker engine on remote machine (https://docs.docker.com/engine/install/)
tip

It is recommended to use remote server other than laptop to deploy Hadoop Cluster, because 7-8 containers may consume a lot of hardware resources and cause your laptop run slower than before.

  1. [Optional] If you want to deploy hadoop cluster on remote machine, please set correct DOCKER_HOST.

    If you don't set DOCKER_HOST, you will deploy Hadoop Cluster in your laptop. (Make sure you can log in this server via SSH without input password each time by ssh-copy-id.)

    # see more detail at : https://docs.docker.com/compose/reference/envvars/#docker_host
    export DOCKER_HOST=ssh://${USER}@${DOCKER_HOST}
  2. Check available resource of your docker desktop in laptop (or docker engine in remote server, depends on which machine you want to deploy Hadoop Cluster), make sure you leave 6 CPUs and 12 GB memory at least .

    Following is the setting page of Docker Desktop of MacBook.

    resources

  3. Deploy hadoop cluster via docker compose on laptop(or on remote machine)

    ./dev-support/sandbox/sandbox.sh up

    sandbox.sh up

Check status of hadoop cluster

  • Wait 2-5 minutes, check health of hadoop, you can use following command to check status
./dev-support/sandbox/sandbox.sh ps

Following output content shows all hadoop component are in health state.

sandbox.sh ps

  • Edit /etc/hosts. (if you deployed Hadoop cluster on remote machine, please use correct ip address other than 127.0.0.1 .)
127.0.0.1 namenode datanode resourcemanager nodemanager historyserver mysql zookeeper hivemetastore hiveserver 
  • Load sample SSB data into HDFS and Hive
./dev-support/sandbox/sandbox.sh sample
  • Check hive table

hive table

Debug Kylin in IDEA

Start backend in IDEA

  • Select "BootstrapServer[docker-sandbox]" on top of IDEA and click Run .

click BootstrapServer[docker-sandbox]

  • Wait and check if Sparder is start succeed.

Spark is submitted

  • Check if SparkUI of Sparder is started.

Spark UI

Start frontend in IDEA

  • Set up dev proxy
cd kystudio
npm run devproxy

setup front end

Validate Kylin's core functions

  • Visit Kylin WEB UI in your laptop

setup front end

  • Create a new project, load table and create model

  • Validate Cube Build and Query function

build job

query a agg index

Command manual

  1. Use ./dev-support/sandbox/sandbox.sh stop to stop all containers
  2. Use ./dev-support/sandbox/sandbox.sh start to start all containers
  3. Use ./dev-support/sandbox/sandbox.sh ps to check status of all containers
  4. Use ./dev-support/sandbox/sandbox.sh down to stop all containers and delete them

Q&A

// todo