Service Monitoring
Kylin provides the service monitoring for main components to help administrators obtain the service status and maintain instances.
Currently, we provide the following methods to monitor the core components in Kylin:
- Query: each Query node will records its service status in InfluxDB
- Build: each All node will records the service status and job status in InfluxDB
Two Rest APIs are provided to monitor and obtain the service status so that customers can integrate it with their own monitor platform.
- Get the Kylin cluster status by monitor query and building services. If the status is
WARNING
orCRASH
, it means the cluster is unstable. - Get the service unavailable time with the specified time range and some detailed monitor data to help admins to track and retrospect.
How to Use
Get Cluster Status
GET http://host:port/kylin/api/monitor/status
-
HTTP Header
- Accept: application/vnd.apache.kylin-v4-public+json
- Accept-Language: en
- Content-Type: application/json;charset=utf-8
-
Curl Request Example
curl -X GET \
'http://host:port/kylin/api/monitor/status' \
-H 'Accept: application/vnd.apache.kylin-v4-public+json' \
-H 'Accept-Language: en' \
-H 'Authorization: Basic QURNSU46S1lMSU4=' \
-H 'Content-Type: application/json;charset=utf-8' -
Response Details
active_instances
number of active instances in current cluster.query_status
query service status. It could be GOOD / WARNING / CRASHjob_status
building service status. It could be GOOD / WARNING / CRASH.Job
job instance status. It will show the instance details and status.query
query instance status. It will show the instance details and status.
-
Response Example
{
"code": "000",
"data": {
"active_instances": 1,
"query_status": "GOOD",
"job_status": "GOOD",
"job": [
{
"instance": "sandbox.hortonworks.com:7070",
"status": "GOOD"
}
],
"query": [
{
"instance": "sandbox.hortonworks.com:7070",
"status": "GOOD"
}
]
},
"msg": ""
}
Get Cluster Status with Specific Time Range
GET http://host:port/kylin/api/monitor/status/statistics
-
HTTP Header
- Accept: application/vnd.apache.kylin-v4-public+json
- Accept-Language: en
- Content-Type: application/json;charset=utf-8
-
URL Parameters
start
-required
Long
timestamp. Get the monitor data greater than or equal to the timestamp.end
-reuquired
Long
timestamp. Get the monitor data smaller than the timestamp.
-
Curl Example
curl -X GET \
'http://host:port/kylin/api/monitor/status/statistics?start=1583562358466&end=1583562358466' \
-H 'Accept: application/vnd.apache.kylin-v4-public+json' \
-H 'Accept-Language: en' \
-H 'Authorization: Basic QURNSU46S1lMSU4=' \
-H 'Content-Type: application/json;charset=utf-8' -
Response Details
Start
start time of monitoring. It will be rounded down based on the interval of monitoring data. If the interval is 1 minute, it will only record data in minute level. For example, if the argument is1587353550000
, it will be recognized as1587353520000
. Therefore, the data might be inaccurate.end
end time of monitoring. It will be rounded down based on the interval of monitoring data. If the interval is 1 minute, it will only record data in minute level. For example, if the argument is1587353550000
, it will be recognized as1587353520000
. Therefore, the data might be inaccurate.interval
interval of monitor data, default value is 60000 ms (1 min)job
job instance status. It will show the instance details and status, which includes unavailable time and counts. The time unit of unavailable time is ms.query
query instance status. It will show the instance details and status, which includes unavailable time and counts. The time unit of unavailable time is ms.
-
Response Example
{
"code":"000",
"data":{
"start":1584151560000,
"end":1584151680000,
"interval":60000,
"job":[
{
"instance":"sandbox.hortonworks.com:7070",
"details":[
{
"time":1584151572650,
"status":"GOOD"
},
{
"time":1584151632770,
"status":"GOOD"
}
],
"unavailable_time":0,
"unavailable_count":0
}
],
"query":[
{
"instance":"sandbox.hortonworks.com:7070",
"details":[
{
"time":1584151609142,
"status":"GOOD"
},
{
"time":1584151669142,
"status":"GOOD"
}
],
"unavailable_time":0,
"unavailable_count":0
}
]
},
"msg":""
}
Know Limitation
- The detected query is constant query which will not scan HDFS files.
- InfluxDB is not high available now. Hence, some monitor data will be lost if the InfluxDB service is down.
- The job status will be inaccurate if deleting or discarding plenty of jobs.
- Since system monitoring depends on InfluxDB, if the system monitoring is still enabled (enabled by default) when InfluxDB is not configured, some useless errors may appear in the log. So when InfluxDB is not configured, it is recommended to configure
kylin.monitor.enabled = false
inkylin.properties
to turn off the system monitoring function.