Skip to main content

Install Validation

Kylin uses the open source SSB (Star Schema Benchmark) dataset for star schema OLAP scenarios as a test dataset. You can verify whether the installation is successful by running a script to import the SSB dataset into Hive. The SSB dataset is from multiple CSV files.

This section verifies installation with the following steps:

Import Sample Data

Run the following command to import the sample data:

$KYLIN_HOME/bin/sample.sh

The script will create 1 database SSB and 6 Hive tables then import data into it.

After running successfully, you should be able to see the following information in the console:

Sample hive tables are created successfully

We will be using SSB dataset as the data sample to introduce Kylin in several sections of this product manual. The SSB dataset simulates transaction data for the online store, see more details in Sample Dataset. Below is a brief introduction.

TableDescriptionIntroduction
CUSTOMERcustomer informationincludes customer name, address, contact information .etc.
DATESorder dateincludes a order's specific date, week, month, year .etc.
LINEORDERorder informationincludes some basic information like order date, order amount, order revenue, supplier ID, commodity ID, customer Id .etc.
PARTproduct informationincludes some basic information like product name, category, brand .etc.
P_LINEORDERview based on order information tableincludes all content in the order information table and new content in the view
SUPPLIERsupplier informationincludes supplier name, address, contact information .etc.

Validate Product Functions

On the Data Asset -> Model page, you should see an example model with som storage over 0.00 KB, this indicates the data has been is loaded for this model.

model list

On the Monitor -> Job page, you should see all jobs have been completed successfully.

job monitor

Validate Query Analysis

When the metadata is loaded successfully, at the Insight page, 6 sample hive tables would be shown at the left panel. User could input query statements against these tables. For example, the SQL statement queries different product group by order date, and in descending order by total revenue:

SELECT LO_PARTKEY, SUM(LO_REVENUE) AS TOTAL_REVENUE
FROM SSB.P_LINEORDER
WHERE LO_ORDERDATE between '1993-06-01' AND '1994-06-01'
group by LO_PARTKEY
order by SUM(LO_REVENUE) DESC

The query result will be displayed at the Insight page, showing that the query hit the sample model.

query result

You can also use the same SQL statement to query on Hive to verify the result and response time of this query.