Index Optimization
Kylin recommendation engine includes index optimization as one of its core capabilities. Besides recommending new high value indexes to reduce query response times, Kylin also recommends deleting low value indexes discovered following index optimization strategies to help reduce index building pressure and storage costs. This article introduces index optimization strategies.
Inclusion strategy (recommended)
When the Kylin recommendation engine detects an inclusion relationship among indexes, it generates recommendations to delete these indexes. These recommendations do not impact query efficiency.
-
For table indexes, the inclusion relationship must satisfy all of the following conditions:
- There is an inclusion relationship among dimensions, and the dimensions are in the same order. For example, if A, B, and C represent three dimensions, then the table index [A, B, C] has an inclusion relationship with the table indexes [A, C] and [B, C], but not with [C, A] or [C, B].
- The table indexes share the same
ShardBy
column.
-
For aggregate indexes, the inclusion relationship must satisfy all of the following conditions:
- The dimensions are exactly the same, and there is an inclusion relationship between the measure sets. For example, if A, B, and C represent three dimensions, and m1, m2, and m3 represent three measures, then the aggregate index [A, B, C, m1, m2, m3] has an inclusion relationship with the aggregate indexes [A, B, C, m1] and [A, B, C, m2, m3], but not with [A, C, m2] or [A, C, m1, m3].
- The aggregate indexes share the same
ShardBy
column.
[!NOTE]
There is no inclusion relation between table indexes and aggregate indexes.
Similarity strategy
When the Kylin recommendation engine detects similarities among aggregate indexes, it will generate a recommendation to optimize these indexes. This recommendation may affect query efficiency.
Similar indexes must satisfy all of the following conditions:
-
Parent-Child Relationship: Aggregate indexes generated by aggregate groups must have a parent-child relationship, meaning their dimensions have an inclusion relation. For example, if A, B, and C represent three dimensions, and m1 and m2 represent two measures, then the aggregate index [A, B, C, m1, m2] has a parent-child relationship with [A, C, m1, m2].
-
Row Number Ratio: The ratio of the row number of child indexes to that of the parent indexes must exceed a certain threshold. By default, this threshold is 0.9. You can adjust the threshold by modifying the
kylin.index.similarity-ratio-threshold
parameter. -
Row Number Difference: The difference in row numbers between parent and child indexes must be smaller than a defined threshold. By default, this threshold is 100 million. You can change the threshold by modifying the
kylin.index.beyond-similarity-bias-threshold
parameter. -
Shared ShardBy Column: Parent and child indexes must share the same
ShardBy
column.
Low-frequency strategy
The Kylin recommendation engine automatically analyzes how often an index is hit within a specific time range, such as daily or weekly. If the number of hits is below a certain threshold, it generates a recommendation to delete the index. This recommendation may affect query efficiency.
To adjust the low-frequency strategy, log in to Kylin, click the project **Setting **page, and modify the strategy in the Index Optimization section.
Apply index optimization strategies
You can modify the kylin.index.optimization-level
parameter to apply different optimization strategies for both custom indexes and those recommended by the recommendation engine.
The following table lists all the parameter values supported by Kylin and their corresponding index optimization strategies.
Parameter value | Description |
---|---|
0 | Custom indexes or indexes recommended by the recommendation engine will not be evaluated for optimization. |
1 | Custom indexes will not be evaluated for optimization. Indexes recommended by the recommendation engine will be evaluated for optimization using the inclusion strategy. |
2 (default) | Custom indexes will not be evaluated for optimization. Indexes recommended by the recommendation engine will be evaluated for optimization using the inclusion and low-frequency strategies. |
3 | Custom indexes will be evaluated for optimization using the similarity strategy. Indexes recommended by the recommendation engine will be evaluated for optimization using the inclusion and low-frequency strategies. |
FAQ
-
Question: After approving indexes recommended by the recommendation engine, I also created the same indexes by modifying an aggregate group. Will Kylin store both indexes?
Answer: No. To avoid additional storage costs from duplicate index builds, Kylin only retains the indexes recommended by the recommendation engine and changes their source to custom indexes.
-
Question: If I followed a recommendation to delete indexes, are there any other files I need to delete?
Answer: This operation only deletes the index metadata and does not remove the index data stored in HDFS or other storage engine. You can clean these files either automatically or manually. For more details, see Junk File Clean.
-
Question: What is a ShardBy column?
Answer: ShardBy columns are used to evenly distribute data across multiple shards, increasing concurrency and query efficiency. Typically, columns with relatively high cardinality are chosen as ShardBy columns.