InfluxDB Maintenance
This chapter introduces the basic maintenance of InfluxDB.
Connectivity
When InfluxDB is not accessible, you can locate the problem from the following aspects:
-
Check if InfluxDB is running normally by executing
service influxdb status
. If it is not running, please check log files of/var/log/influxdb/influxd.log
or/var/log/messages
to find out the reason, at the same time, runservice influxdb restart
to restart InfluxDB service and make sure the service can be launched normally by observing the logs. (You should be able to login InfluxDB viainflux -host ? -port ?
command) -
If you find the port has been taken in the starting process, run
netstat -anp | grep influxdb_port
to get the process id, and executeps -ef | grep pid
to get the specific process. You can choose to kill the process if you do not need it or to change InfluxDB's server port to another. -
If you are having your Kylin and InfluxDB installed in different nodes, please execute
telnet influxdb_ip influxdb_port
on Kylin node to check if two nodes can communicate normally, if not, please make sure the Firewall service is not turned on on InfluxDB node viaservice iptables status
command or contact the system admin to check the network condition.
Log Management
-
Log Configuration
- By default, InfluxDB writes standard error to log. InfluxDB redirects stderr to
/var/log/influxdb/influxd.log
file when it is started. If you would like to change the log path, please modify the property in the configuration file/etc/default/influxdb
toSTDERR=/path/to/influxdb.log
, and restart the service viaservice influxdb restart
command. - InfluxDB enables HTTP access log by default. Generally, HTTP access log is quite large, you can modify the property
[http] log-enabled=false
to disable the log output.
- By default, InfluxDB writes standard error to log. InfluxDB redirects stderr to
-
Log Clean
InfluxDB itself does not clean its log regularly, it uses logrotate to manage log, which is installed on Linux system by default. The configuration file of logrotate is located at
/etc/logrotate.d/influxdb
, the log rotates by day, and the retention is 7 days.
Backup and Restore
InfluxDB provides the availability to do backup and restore.
-
Backup
influxd backup -portable -database KYLIN_METRIC -host 127.0.0.1:8089 /path/to/backup
-
Restore
Please make sure that the database exists, otherwise the restore will be failed.
influxd restore -portable -database KYLIN_METRIC -host 127.0.0.1:8089 /path/to/backup
note: Please replace KYLIN_METRIC with the actual database name, replace 127.0.0.1:8089 with the actual IP and port, replace
/path/to/backup
with the path you would like to set.
Monitoring and Diagnosis
-
Memory Monitoring
-
Check runtime
Run following command to check GC, memory usage, etc.
influx -database KYLIN_METRIC -execute "show stats for 'runtime'"
Please focus on these important arguments:
- HeapAlloc -> Heap allocation size
- Sys -> The total number of bytes of memory obtained from the system
- NumGC -> GC times
- PauseTotalNs -> The total GC pause time
-
Check the memory usage of InfluxDB index
show stats for 'indexes'
-
Monitor InfluxDB memory usage
Run following command:
pidstat -rh -p PID 5
If the memory usage is too high or GC is too frequent, please increase memory.
tips: It is recommended to install InfluxDB on a separate machine with high memory allocation, because data read and write speed are dependent on the indexes, and the indexes are stored in memory.
-
-
Disk Monitoring
Run following command to check disk situation:
pidstat -d -p PID 5
When the disk read/write load is found to be too high, you can consider mapping the WAL directory and the data directory to different disks to reduce the interaction between read and write operations.
- Run
vi /etc/default/influxdb
to edit the configuration file. - Modify the properties
[data] dir = "/var/lib/influxdb/data"
andwal-dir = "/var/lib/influxdb/wal"
to point WAL directory and data directory to different disk.
- Run
-
Read/Write Response Time
-
Write:
SELECT non_negative_derivative(percentile("writeReqDurationNs", 99)) / non_negative_derivative(max("writeReq")) / (1000 * 1000) AS "Write Request"
FROM "_internal".."httpd"
WHERE time > now() - 10d
GROUP BY time(1h) fill(0) -
Read:
SELECT non_negative_derivative(percentile("queryReqDurationNs", 99)) / non_negative_derivative(max("queryReq")) / (1000 * 1000) AS "Query Request"
FROM "_internal".."httpd"
WHERE time > now() - 10d
GROUP BY time(1h) -