Kylin Guardian Process
Since Kylin 5.0, the system added a function of a daemon process for monitoring the health state of Kylin. This function is called Kylin Guardian Process. If the Kylin Guardian Process detects Kylin is in an unhealthy state, it will restart Kylin server or downgrade service.
Usage
Turn On
Kylin Guardian Process is disabled by default. If you want to enable it, you need to add the configuration kylin.guardian.enabled = true
in the global configuration file $KYLIN_HOME/conf/kylin.properties
.
Note: All the following configurations take effect if
kylin.guardian.enabled = true
If Kylin Guardian Process is enabled, a daemon process will be automatically started after starting Kylin. This process is bound to environment variable KYLIN_HOME
, which means each Kylin instance has only one Kylin Guardian Process corresponding to it.
Kylin Guardian Process description:
-
The process ID is recorded in
$KYLIN_HOME/kgid
. -
The log of the process is output in
$KYLIN_HOME/logs/guardian.log
. -
Kylin Guardian Process will periodically check the health status of Kylin. The time delay of the first check is configured by the parameter
kylin.guardian.check-init-delay
(Unit: minutes), the default is 5 minutes, and the check interval is set by the parameterkylin.guardian.check-interval
(Unit: minutes), the default is 1 minute.
Check Items
Kylin Guardian Process currently checks the following 4 aspects of Kylin instance's health.
-
Kylin process status
If the process number file
$KYLIN_HOME/pid
exists and the corresponding process does not exist, it means Kylin server is in an abnormal down state, and Kylin Guardian Process will restart it. -
Spark Context restart failure check
If the number of Spark Context restart failure times is greater than or equals to the value of configuration
kylin.guardian.restart-spark-fail-threshold
, which is 3 times by default, Kylin Guardian Process will restart Kylin. This function is enabled by default. If you want to disable it, please add the configurationkylin.guardian.restart-spark-fail-restart-enabled = false
in$KYLIN_HOME/conf/kylin.properties
. -
Bad Query canceled failed check
Note: Some queries will be forcibly closed due to abnormal reasons. At this time, the query is Bad Query, and the common case is a timeout query.
If Kylin Guardian Process detects the number of Bad Query cancellation times is greater than or equals to the value of configuration
kylin.guardian.kill-slow-query-fail-threshold
, which is 3 times by default, Kylin Guardian Process will restart Kylin. It is enabled by default. If you want to disable it, you can add the configurationkylin.guardian.kill-slow-query-fail-restart-enabled = false
in$KYLIN_HOME/conf/kylin.properties
. -
Full GC(Garbage Collection, Garbage collection mechanism in Java) duration check
If the Full GC duration ratio in most recent period (default is value of
kylin.guardian.full-gc-check-factor
* value ofkylin.guardian.check-interval
) is greater than or equals to the value of configurationkylin.guardian.full-gc-duration-ratio-threshold
which is 75% by default, Kylin Guardian Process will restart Kylin. It is enabled by default. If you want to disable it, you can add the configurationkylin.guardian.full-gc-duration-ratio-restart-enabled = false
in$KYLIN_HOME/conf/kylin.properties
.
Kylin Guardian Process High Availability
To ensure the high availability of Kylin Guardian Process, Kylin will also periodically check the status of Kylin Guardian Process. If Kylin detects the Kylin Guardian Process does not exist, it will automatically start it. The feature is enabled by default. If you want to disable it, you can add the configuration kylin.guardian.ha-enabled=false
in $$KYLIN_HOME/conf/kylin.properties
. The time delay of the first check is configured by the parameter kylin.guardian.ha-check-init-delay
(Unit: minutes) which is 5 minutes by default, and the check interval is set by the parameter kylin.guardian.ha-check-interval
(Unit: minutes) which is 1 minute by default.
Kylin OOM(Out of Memory) restarts automatically
Kylin Guardian Process supports restarting Kylin when the JVM of Kylin appears OOM.