Options
Options below can be used when create table for analytic engine
-
enable_ttl
,bool
. When enable TTL on a table, rows older thanttl
will be deleted and can't be querid, defaulttrue
-
ttl
,duration
, lifetime of a row, only used whenenable_ttl
istrue
. default7d
. -
storage_format
,string
. The underlying column's format. Availiable values:columnar
, defaulthybrid
The meaning of those two values are in Storage format section.
Storage Format
There are mainly two formats supported in analytic engine. One is columnar
, which is the traditional columnar format, with one table column in one physical column:
| Timestamp | Device ID | Status Code | Tag 1 | Tag 2 |
| --------- |---------- | ----------- | ----- | ----- |
| 12:01 | A | 0 | v1 | v1 |
| 12:01 | B | 0 | v2 | v2 |
| 12:02 | A | 0 | v1 | v1 |
| 12:02 | B | 1 | v2 | v2 |
| 12:03 | A | 0 | v1 | v1 |
| 12:03 | B | 0 | v2 | v2 |
| ..... | | | | |
The other one is hybrid
, an experimental format used to simulate row-oriented storage in columnar storage to accelerate traditional time-series query.
In traditional time-series user cases like IoT or DevOps, queries will typically first group their result by series id(or device id), then by timestamp. In order to achieve good performance in those scenarios, the data physical layout should match this style, so the hybrid
format is proposed like this:
| Device ID | Timestamp | Status Code | Tag 1 | Tag 2 | minTime | maxTime |
|-----------|---------------------|-------------|-------|-------|---------|---------|
| A | [12:01,12:02,12:03] | [0,0,0] | v1 | v1 | 12:01 | 12:03 |
| B | [12:01,12:02,12:03] | [0,1,0] | v2 | v2 | 12:01 | 12:03 |
| ... | | | | | | |
- Within one file, rows belonging to the same primary key(eg: series/device id) are collapsed into one row
- The columns besides primary key are divided into two categories:
collapsible
, those columns will be collapsed into a list. Used to encodefields
in time-series table- Note: only fixed-length type is supported now
non-collapsible
, those columns should only contain one distinct value. Used to encodetags
in time-series table- Note: only string type is supported now
- Two more columns are added,
minTime
andmaxTime
. Those are used to cut unnecessary rows out in query.- Note: Not implemented yet.
Example
CREATE TABLE `device` (
`ts` timestamp NOT NULL,
`tag1` string tag,
`tag2` string tag,
`value1` double,
`value2` int,
timestamp KEY (ts)) ENGINE=Analytic
with (
enable_ttl = 'false',
storage_format = 'hybrid'
);
This will create a table with hybrid format, users can inspect data format with parquet-tools. The table above should have following parquet schema:
message arrow_schema {
optional group ts (LIST) {
repeated group list {
optional int64 item (TIMESTAMP(MILLIS,false));
}
}
required int64 tsid (INTEGER(64,false));
optional binary tag1 (STRING);
optional binary tag2 (STRING);
optional group value1 (LIST) {
repeated group list {
optional double item;
}
}
optional group value2 (LIST) {
repeated group list {
optional int32 item;
}
}
}