DorisDB vs ClickHouse SSB对比测试

DorisDB vs ClickHouse SSB对比测试

TL;DR

  1. 进行本次测试时对DorisDB了解甚微
  2. 本次测试由于服务器资源有限, 没有严格遵循单一变量原则进行测试
  3. 本次测试有一定参考意义

数据导入速度

  • ClickHouse: 3500s
  • DorisDB: 5160s

数据压缩情况(通过磁盘占用空间比较)

  • ClickHouse: 85.2G
  • DorisDB: 132G

查询速度

单表查询

单表

DorisDB1 DorisDB2 ClickHouse1 ClickHouse2
Q1.1 350 290 226 195
Q1.2 270 190 34 63
Q1.3 310 240 43 22
Q2.1 410 370 1,723 1,791
Q2.2 780 720 1,463 1,470
Q2.3 340 280 659 1,337
Q3.1 1,560 860 3,488 1,254
Q3.2 1,080 790 1,272 966
Q3.3 250 290 979 889
Q3.4 230 260 36 20
Q4.1 870 720 5,067 2,791
Q4.2 720 490 804 752
Q4.3 510 380 561 482

多表查询

多表

DorisDB1 DorisDB2 ClickHouse1 ClickHouse2
Q1.1 450 490 1,496 1,424
Q1.2 410 450 1,366 659
Q1.3 510 340 678 1,377
Q2.1 1,560 1,600 4,360 2,667
Q2.2 1,690 1,060 4,498 1,554
Q2.3 780 1,150 2,569 2,577
Q3.1 3,480 3,700 10,190 12,960
Q3.2 1,320 1,850 5,926 5,743
Q3.3 1,030 1,040 3,445 3,300
Q3.4 1,330 1,170 3,455 3,330
Q4.1 3,480 3,750 15,560 9,494
Q4.2 2,830 3,170 16,109 18,048
Q4.3 1,560 2,140 15,685 14,838

环境信息

ClickHouse: 3台 华为云ECS 高性能计算型 | h3.xlarge.2 | 4vCPUs | 8GB | 超高IO SSD

DorisDB: 3台 华为云ECS 高性能计算型 | h3.2xlarge.4 | 8vCPUs | 32GB | 超高IO SSD

由于资源紧张, DorisDB所在服务器上还部署了rc mysql, 但过年期间无人使用, 实际可用内存16G.

DorisDB: DorisDB-SE-1.12.1 3Fe 3Be, fe和be部署在一起

ClickHouse: 20.10.3.30, 三分片两副本混合部署, 部署方法详见ClickHouse集群多实例部署
img

注意

实际数据导入后DorsiDB和ClickHouse除lineorder_flat外数据无任何差异

lineorder_flat:

  • DorsiDB: 546669614
  • ClickHouse: 622259902

DorisDB

部署略

构建数据

首先下载ssb-poc工具包并编译

1
2
3
4
wget http://dorisdb-public.oss-cn-zhangjiakou.aliyuncs.com/ssb-poc-0.9.zip
unzip ssb-poc-0.9.zip
cd ssb-poc
make && make install

所有相关工具安装到output目录。

进入output目录,生成数据

1
2
cd output
bin/gen-ssb.sh 100 data_dir

建表

修改配置文件conf/doris.conf,指定脚本操作的Doris集群地址

1
2
3
4
5
6
7
8
9
10
11
12
13
 # for mysql cmd
mysql_host: 192.168.1.1
mysql_port: 9030
mysql_user: root
mysql_password:
doris_db: ssb

# cluster ports
http_port: 8030
be_heartbeat_port: 9050
broker_port: 8000

...

执行脚本建表

1
bin/create_db_table.sh ddl_100

我这里建表跑建表脚本报错了, 改为手动建表, 参考http://doc.dorisdb.com/2146807

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
CREATE TABLE IF NOT EXISTS `lineorder` (
`lo_orderkey` int(11) NOT NULL COMMENT "",
`lo_linenumber` int(11) NOT NULL COMMENT "",
`lo_custkey` int(11) NOT NULL COMMENT "",
`lo_partkey` int(11) NOT NULL COMMENT "",
`lo_suppkey` int(11) NOT NULL COMMENT "",
`lo_orderdate` int(11) NOT NULL COMMENT "",
`lo_orderpriority` varchar(16) NOT NULL COMMENT "",
`lo_shippriority` int(11) NOT NULL COMMENT "",
`lo_quantity` int(11) NOT NULL COMMENT "",
`lo_extendedprice` int(11) NOT NULL COMMENT "",
`lo_ordtotalprice` int(11) NOT NULL COMMENT "",
`lo_discount` int(11) NOT NULL COMMENT "",
`lo_revenue` int(11) NOT NULL COMMENT "",
`lo_supplycost` int(11) NOT NULL COMMENT "",
`lo_tax` int(11) NOT NULL COMMENT "",
`lo_commitdate` int(11) NOT NULL COMMENT "",
`lo_shipmode` varchar(11) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`lo_orderkey`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`lo_orderkey`) BUCKETS 96
PROPERTIES (
"replication_num" = "1",
"colocate_with" = "group1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);


CREATE TABLE IF NOT EXISTS `customer` (
`c_custkey` int(11) NOT NULL COMMENT "",
`c_name` varchar(26) NOT NULL COMMENT "",
`c_address` varchar(41) NOT NULL COMMENT "",
`c_city` varchar(11) NOT NULL COMMENT "",
`c_nation` varchar(16) NOT NULL COMMENT "",
`c_region` varchar(13) NOT NULL COMMENT "",
`c_phone` varchar(16) NOT NULL COMMENT "",
`c_mktsegment` varchar(11) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`c_custkey`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`c_custkey`) BUCKETS 12
PROPERTIES (
"replication_num" = "1",
"colocate_with" = "groupa2",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);


CREATE TABLE IF NOT EXISTS `dates` (
`d_datekey` int(11) NOT NULL COMMENT "",
`d_date` varchar(20) NOT NULL COMMENT "",
`d_dayofweek` varchar(10) NOT NULL COMMENT "",
`d_month` varchar(11) NOT NULL COMMENT "",
`d_year` int(11) NOT NULL COMMENT "",
`d_yearmonthnum` int(11) NOT NULL COMMENT "",
`d_yearmonth` varchar(9) NOT NULL COMMENT "",
`d_daynuminweek` int(11) NOT NULL COMMENT "",
`d_daynuminmonth` int(11) NOT NULL COMMENT "",
`d_daynuminyear` int(11) NOT NULL COMMENT "",
`d_monthnuminyear` int(11) NOT NULL COMMENT "",
`d_weeknuminyear` int(11) NOT NULL COMMENT "",
`d_sellingseason` varchar(14) NOT NULL COMMENT "",
`d_lastdayinweekfl` int(11) NOT NULL COMMENT "",
`d_lastdayinmonthfl` int(11) NOT NULL COMMENT "",
`d_holidayfl` int(11) NOT NULL COMMENT "",
`d_weekdayfl` int(11) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`d_datekey`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`d_datekey`) BUCKETS 1
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"colocate_with" = "groupa3",
"storage_format" = "DEFAULT"
);

CREATE TABLE IF NOT EXISTS `supplier` (
`s_suppkey` int(11) NOT NULL COMMENT "",
`s_name` varchar(26) NOT NULL COMMENT "",
`s_address` varchar(26) NOT NULL COMMENT "",
`s_city` varchar(11) NOT NULL COMMENT "",
`s_nation` varchar(16) NOT NULL COMMENT "",
`s_region` varchar(13) NOT NULL COMMENT "",
`s_phone` varchar(16) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`s_suppkey`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`s_suppkey`) BUCKETS 12
PROPERTIES (
"replication_num" = "1",
"colocate_with" = "groupa4",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);

CREATE TABLE IF NOT EXISTS `part` (
`p_partkey` int(11) NOT NULL COMMENT "",
`p_name` varchar(23) NOT NULL COMMENT "",
`p_mfgr` varchar(7) NOT NULL COMMENT "",
`p_category` varchar(8) NOT NULL COMMENT "",
`p_brand` varchar(10) NOT NULL COMMENT "",
`p_color` varchar(12) NOT NULL COMMENT "",
`p_type` varchar(26) NOT NULL COMMENT "",
`p_size` int(11) NOT NULL COMMENT "",
`p_container` varchar(11) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`p_partkey`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`p_partkey`) BUCKETS 12
PROPERTIES (
"replication_num" = "1",
"colocate_with" = "groupa5",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);

CREATE TABLE IF NOT EXISTS `lineorder_flat` (
`LO_ORDERKEY` int(11) NOT NULL COMMENT "",
`LO_ORDERDATE` date NOT NULL COMMENT "",
`LO_LINENUMBER` tinyint(4) NOT NULL COMMENT "",
`LO_CUSTKEY` int(11) NOT NULL COMMENT "",
`LO_PARTKEY` int(11) NOT NULL COMMENT "",
`LO_SUPPKEY` int(11) NOT NULL COMMENT "",
`LO_ORDERPRIORITY` varchar(100) NOT NULL COMMENT "",
`LO_SHIPPRIORITY` tinyint(4) NOT NULL COMMENT "",
`LO_QUANTITY` tinyint(4) NOT NULL COMMENT "",
`LO_EXTENDEDPRICE` int(11) NOT NULL COMMENT "",
`LO_ORDTOTALPRICE` int(11) NOT NULL COMMENT "",
`LO_DISCOUNT` tinyint(4) NOT NULL COMMENT "",
`LO_REVENUE` int(11) NOT NULL COMMENT "",
`LO_SUPPLYCOST` int(11) NOT NULL COMMENT "",
`LO_TAX` tinyint(4) NOT NULL COMMENT "",
`LO_COMMITDATE` date NOT NULL COMMENT "",
`LO_SHIPMODE` varchar(100) NOT NULL COMMENT "",
`C_NAME` varchar(100) NOT NULL COMMENT "",
`C_ADDRESS` varchar(100) NOT NULL COMMENT "",
`C_CITY` varchar(100) NOT NULL COMMENT "",
`C_NATION` varchar(100) NOT NULL COMMENT "",
`C_REGION` varchar(100) NOT NULL COMMENT "",
`C_PHONE` varchar(100) NOT NULL COMMENT "",
`C_MKTSEGMENT` varchar(100) NOT NULL COMMENT "",
`S_NAME` varchar(100) NOT NULL COMMENT "",
`S_ADDRESS` varchar(100) NOT NULL COMMENT "",
`S_CITY` varchar(100) NOT NULL COMMENT "",
`S_NATION` varchar(100) NOT NULL COMMENT "",
`S_REGION` varchar(100) NOT NULL COMMENT "",
`S_PHONE` varchar(100) NOT NULL COMMENT "",
`P_NAME` varchar(100) NOT NULL COMMENT "",
`P_MFGR` varchar(100) NOT NULL COMMENT "",
`P_CATEGORY` varchar(100) NOT NULL COMMENT "",
`P_BRAND` varchar(100) NOT NULL COMMENT "",
`P_COLOR` varchar(100) NOT NULL COMMENT "",
`P_TYPE` varchar(100) NOT NULL COMMENT "",
`P_SIZE` tinyint(4) NOT NULL COMMENT "",
`P_CONTAINER` varchar(100) NOT NULL COMMENT ""
) ENGINE=OLAP
DUPLICATE KEY(`LO_ORDERKEY`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`LO_ORDERKEY`) BUCKETS 192
PROPERTIES (
"replication_num" = "1",
"colocate_with" = "groupxx1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);

连接一个fe执行以上sql即可

导入数据

DorisDB测试中使用python脚本导入数据, 需要安装pymysql库

使用Stream load导入单表数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
(myrecover) [root@bj2-mysql-rc-drill-02 output]# time bin/stream_load.sh data_dir
stream load start. table: lineorder, path: data_dir/lineorder.tbl.1
stream load start. table: lineorder, path: data_dir/lineorder.tbl.2
stream load start. table: lineorder, path: data_dir/lineorder.tbl.3
stream load start. table: lineorder, path: data_dir/lineorder.tbl.4
stream load start. table: lineorder, path: data_dir/lineorder.tbl.5
stream load start. table: lineorder, path: data_dir/lineorder.tbl.6
stream load start. table: lineorder, path: data_dir/lineorder.tbl.7
stream load start. table: lineorder, path: data_dir/lineorder.tbl.8
stream load start. table: lineorder, path: data_dir/lineorder.tbl.9
stream load start. table: lineorder, path: data_dir/lineorder.tbl.10
...

stream load success. table: lineorder, path: data_dir/lineorder.tbl.100
stream load success. table: lineorder, path: data_dir/lineorder.tbl.97
stream load success. table: lineorder, path: data_dir/lineorder.tbl.99
stream load success. table: lineorder, path: data_dir/lineorder.tbl.93
stream load success. table: lineorder, path: data_dir/lineorder.tbl.98
stream load success. table: lineorder, path: data_dir/lineorder.tbl.94
stream load success. table: lineorder, path: data_dir/lineorder.tbl.92
stream load success. table: lineorder, path: data_dir/lineorder.tbl.95
stream load success. table: lineorder, path: data_dir/lineorder.tbl.91
stream load success. table: lineorder, path: data_dir/lineorder.tbl.96
stream load start. table: customer, path: data_dir/customer.tbl
stream load success. table: customer, path: data_dir/customer.tbl
stream load start. table: dates, path: data_dir/dates.tbl
stream load success. table: dates, path: data_dir/dates.tbl
stream load start. table: part, path: data_dir/part.tbl
stream load success. table: part, path: data_dir/part.tbl
stream load start. table: supplier, path: data_dir/supplier.tbl
stream load success. table: supplier, path: data_dir/supplier.tbl

real 82m31.323s
user 0m6.385s
sys 0m36.761s

(myrecover) [root@bj2-mysql-rc-drill-02 output]# time bin/flat_insert.sh
sql: ssb_flat_insert start
sql: ssb_flat_insert success

real 1m36.697s
user 0m0.078s
sys 0m0.022s

插入数据到宽表lineorder_flat

1
2
3
4
5
6
7
(myrecover) [root@bj2-mysql-rc-drill-02 output]# time bin/flat_insert.sh
sql: ssb_flat_insert start
sql: ssb_flat_insert success

real 2m37.530s
user 0m0.063s
sys 0m0.023s

有一个不理解的现象是, flat_insert.sh已经执行完毕, 但是查看lineorder_flat表行数时, 发现其值是在不断增大

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
mysql> select count(*) from lineorder_flat;
+-----------+
| count(*) |
+-----------+
| 182256332 |
+-----------+
1 row in set (0.10 sec)

mysql> select count(*) from lineorder_flat;
+-----------+
| count(*) |
+-----------+
| 364316982 |
+-----------+
1 row in set (0.16 sec)

mysql> select count(*) from lineorder_flat;
+-----------+
| count(*) |
+-----------+
| 546669614 |
+-----------+
1 row in set (0.41 sec)

可以看到DorisDB数据导入耗时约86分钟

最终数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
mysql> select count(*) from customer       ;
+----------+
| count(*) |
+----------+
| 3000000 |
+----------+
1 row in set (0.02 sec)

mysql> select count(*) from dates ;
+----------+
| count(*) |
+----------+
| 2556 |
+----------+
1 row in set (0.01 sec)

mysql> select count(*) from lineorder ;
+-----------+
| count(*) |
+-----------+
| 600037902 |
+-----------+
1 row in set (0.53 sec)

mysql> select count(*) from lineorder_flat ;

+-----------+
| count(*) |
+-----------+
| 546669614 |
+-----------+
1 row in set (0.40 sec)

mysql> select count(*) from part ;
+----------+
| count(*) |
+----------+
| 1400000 |
+----------+
1 row in set (0.01 sec)

mysql> select count(*) from supplier ;
+----------+
| count(*) |
+----------+
| 200000 |
+----------+
1 row in set (0.01 sec)

数据占用空间

1
2
3
4
du -sh /data/DorisDB-SE-1.12.1/be/storage/data/
44G /data/DorisDB-SE-1.12.1/be/storage/data/

44*3 = 132G

单表查询测试

1
set global  parallel_fragment_exec_instance_num  = 4;

Q1.1

1
2
3
4
5
6
7
8
9
10
SELECT sum(lo_extendedprice * lo_discount) AS `revenue` 
FROM lineorder_flat
WHERE lo_orderdate >= '1993-01-01' and lo_orderdate <= '1993-12-31' AND lo_discount BETWEEN 1 AND 3 AND lo_quantity < 25;
+----------------+
| revenue |
+----------------+
| 44652567249651 |
+----------------+
1 row in set (0.35 sec)
1 row in set (0.29 sec)

Q1.2

1
2
3
4
5
6
7
8
9
SELECT sum(lo_extendedprice * lo_discount) AS revenue FROM lineorder_flat  
WHERE lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-01-31' AND lo_discount BETWEEN 4 AND 6 AND lo_quantity BETWEEN 26 AND 35;
+---------------+
| revenue |
+---------------+
| 9624332170119 |
+---------------+
1 row in set (0.27 sec)
1 row in set (0.19 sec)

Q1.3

1
2
3
4
5
6
7
8
9
10
11
SELECT sum(lo_extendedprice * lo_discount) AS revenue 
FROM lineorder_flat
WHERE weekofyear(lo_orderdate) = 6 AND lo_orderdate >= '1994-01-01' and lo_orderdate <= '1994-12-31'
AND lo_discount BETWEEN 5 AND 7 AND lo_quantity BETWEEN 26 AND 35;
+---------------+
| revenue |
+---------------+
| 2611093671163 |
+---------------+
1 row in set (0.31 sec)
1 row in set (0.24 sec)

Q2.1

1
2
3
4
5
6
7
8
9
10
SELECT sum(lo_revenue), year(lo_orderdate) AS year,  p_brand 
FROM lineorder_flat
WHERE p_category = 'MFGR#12' AND s_region = 'AMERICA'
GROUP BY year, p_brand
ORDER BY year, p_brand;

...

240 rows in set (0.41 sec)
240 rows in set (0.37 sec)

Q2.2

1
2
3
4
5
6
7
8
9
SELECT 
sum(lo_revenue), year(lo_orderdate) AS year, p_brand
FROM lineorder_flat
WHERE p_brand >= 'MFGR#2221' AND p_brand <= 'MFGR#2228' AND s_region = 'ASIA'
GROUP BY year, p_brand
ORDER BY year, p_brand;

48 rows in set (0.78 sec)
48 rows in set (0.72 sec)

Q2.3

1
2
3
4
5
6
7
8
SELECT sum(lo_revenue),  year(lo_orderdate) AS year, p_brand 
FROM lineorder_flat
WHERE p_brand = 'MFGR#2239' AND s_region = 'EUROPE'
GROUP BY year, p_brand
ORDER BY year, p_brand;

6 rows in set (0.34 sec)
6 rows in set (0.28 sec)

Q3.1

1
2
3
4
5
6
7
SELECT c_nation, s_nation,  year(lo_orderdate) AS year, sum(lo_revenue) AS revenue FROM lineorder_flat 
WHERE c_region = 'ASIA' AND s_region = 'ASIA' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_nation, s_nation, year
ORDER BY year ASC, revenue DESC;

150 rows in set (1.56 sec)
150 rows in set (0.86 sec)

Q3.2

1
2
3
4
5
6
7
8
SELECT  c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue
FROM lineorder_flat
WHERE c_nation = 'UNITED STATES' AND s_nation = 'UNITED STATES' AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;

600 rows in set (1.08 sec)
600 rows in set (0.79 sec)

Q3.3

1
2
3
4
5
6
7
8
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue 
FROM lineorder_flat
WHERE c_city in ( 'UNITED KI1' ,'UNITED KI5') AND s_city in ( 'UNITED KI1' ,'UNITED KI5') AND lo_orderdate >= '1992-01-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;

24 rows in set (0.25 sec)
24 rows in set (0.29 sec)

Q3.4

1
2
3
4
5
6
7
8
SELECT c_city, s_city, year(lo_orderdate) AS year, sum(lo_revenue) AS revenue 
FROM lineorder_flat
WHERE c_city in ('UNITED KI1', 'UNITED KI5') AND s_city in ( 'UNITED KI1', 'UNITED KI5') AND lo_orderdate >= '1997-12-01' AND lo_orderdate <= '1997-12-31'
GROUP BY c_city, s_city, year
ORDER BY year ASC, revenue DESC;

4 rows in set (0.23 sec)
4 rows in set (0.26 sec)

Q4.1

DorisDB测试文档中对该SQL执行前执行了

1
set vectorized_engine_enable = FALSE; 

但实际发现执行上面语句后反而会慢很多

1
2
3
4
5
6
7
SELECT year(lo_orderdate) AS year, c_nation,  sum(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat 
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, c_nation
ORDER BY year ASC, c_nation ASC;

30 rows in set (1 min 24.46 sec)
30 rows in set (1 min 25.10 sec)
1
2
3
4
5
6
7
8
set vectorized_engine_enable = TRUE; 
SELECT year(lo_orderdate) AS year, c_nation, sum(lo_revenue - lo_supplycost) AS profit FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, c_nation
ORDER BY year ASC, c_nation ASC;

30 rows in set (0.87 sec)
30 rows in set (0.72 sec)

我比对了开启和关闭vectorized_engine_enable后的查询结果, 发现是一样的, 这里我就不明白为啥要设置vectorized_engine_enableFalse

Q4.2

1
2
3
4
5
6
7
8
9
SELECT year(lo_orderdate) AS year, 
s_nation, p_category, sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE c_region = 'AMERICA' AND s_region = 'AMERICA' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_mfgr in ( 'MFGR#1' , 'MFGR#2')
GROUP BY year, s_nation, p_category
ORDER BY year ASC, s_nation ASC, p_category ASC;

30 rows in set (0.72 sec)
50 rows in set (0.49 sec)

Q4.3

1
2
3
4
5
6
7
8
9
SELECT year(lo_orderdate) AS year, s_city, p_brand, 
sum(lo_revenue - lo_supplycost) AS profit
FROM lineorder_flat
WHERE s_nation = 'UNITED STATES' AND lo_orderdate >= '1997-01-01' and lo_orderdate <= '1998-12-31' AND p_category = 'MFGR#14'
GROUP BY year, s_city, p_brand
ORDER BY year ASC, s_city ASC, p_brand ASC;

400 rows in set (0.51 sec)
400 rows in set (0.38 sec)

多表关联测试

执行

1
set global  parallel_fragment_exec_instance_num  = 8;

Q1.1

1
2
3
4
5
6
select sum(lo_revenue) as revenue
from lineorder join dates on lo_orderdate = d_datekey
where d_year = 1993 and lo_discount between 1 and 3 and lo_quantity < 25;

1 row in set (0.45 sec)
1 row in set (0.49 sec)

Q1.2

1
2
3
4
5
6
7
8
9
select sum(lo_revenue) as revenue
from lineorder
join dates on lo_orderdate = d_datekey
where d_yearmonthnum = 199401
and lo_discount between 4 and 6
and lo_quantity between 26 and 35;

1 row in set (0.41 sec)
1 row in set (0.45 sec)

Q1.3

1
2
3
4
5
6
7
8
9
select sum(lo_revenue) as revenue
from lineorder
join dates on lo_orderdate = d_datekey
where d_weeknuminyear = 6 and d_year = 1994
and lo_discount between 5 and 7
and lo_quantity between 26 and 35;

1 row in set (0.51 sec)
1 row in set (0.34 sec)

Q2.1

1
2
3
4
5
6
7
8
9
10
11
select sum(lo_revenue) as lo_revenue, d_year, p_brand
from lineorder
inner join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_category = 'MFGR#12' and s_region = 'AMERICA'
group by d_year, p_brand
order by d_year, p_brand;

280 rows in set (1.56 sec)
280 rows in set (1.60 sec)

Q2.2

1
2
3
4
5
6
7
8
9
10
11
select sum(lo_revenue) as lo_revenue, d_year, p_brand
from lineorder
join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_brand between 'MFGR#2221' and 'MFGR#2228' and s_region = 'ASIA'
group by d_year, p_brand
order by d_year, p_brand;

56 rows in set (1.69 sec)
56 rows in set (1.06 sec)

Q2.3

1
2
3
4
5
6
7
8
9
10
11
select sum(lo_revenue) as lo_revenue, d_year, p_brand
from lineorder
join dates on lo_orderdate = d_datekey
join part on lo_partkey = p_partkey
join supplier on lo_suppkey = s_suppkey
where p_brand = 'MFGR#2239' and s_region = 'EUROPE'
group by d_year, p_brand
order by d_year, p_brand;

7 rows in set (0.78 sec)
7 rows in set (1.15 sec)

Q3.1

1
2
3
4
5
6
7
8
9
10
11
select c_nation, s_nation, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where c_region = 'ASIA' and s_region = 'ASIA'and d_year >= 1992 and d_year <= 1997
group by c_nation, s_nation, d_year
order by d_year asc, lo_revenue desc;

150 rows in set (3.48 sec)
150 rows in set (3.70 sec)

Q3.2

1
2
3
4
5
6
7
8
9
10
11
12
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where c_nation = 'UNITED STATES' and s_nation = 'UNITED STATES'
and d_year >= 1992 and d_year <= 1997
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;

600 rows in set (1.32 sec)
600 rows in set (1.85 sec)

Q3.3

1
2
3
4
5
6
7
8
9
10
11
12
13
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where (c_city='UNITED KI1' or c_city='UNITED KI5')
and (s_city='UNITED KI1' or s_city='UNITED KI5')
and d_year >= 1992 and d_year <= 1997
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;

24 rows in set (1.03 sec)
24 rows in set (1.04 sec)

Q3.4

1
2
3
4
5
6
7
8
9
10
11
12
select c_city, s_city, d_year, sum(lo_revenue) as lo_revenue
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
where (c_city='UNITED KI1' or c_city='UNITED KI5') and (s_city='UNITED KI1' or s_city='UNITED KI5') and d_yearmonth
= 'Dec1997'
group by c_city, s_city, d_year
order by d_year asc, lo_revenue desc;

4 rows in set (1.33 sec)
4 rows in set (1.17 sec)

Q4.1

1
2
3
4
5
6
7
8
9
10
11
12
select d_year, c_nation, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA' and s_region = 'AMERICA' and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2')
group by d_year, c_nation
order by d_year, c_nation;

35 rows in set (3.48 sec)
35 rows in set (3.75 sec)

Q4.2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
select d_year, s_nation, p_category, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA'and s_region = 'AMERICA'
and (d_year = 1997 or d_year = 1998)
and (p_mfgr = 'MFGR#1' or p_mfgr = 'MFGR#2')
group by d_year, s_nation, p_category
order by d_year, s_nation, p_category;

100 rows in set (2.83 sec)
100 rows in set (3.17 sec)

Q4.3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
select d_year, s_city, p_brand, sum(lo_revenue) - sum(lo_supplycost) as profit
from lineorder
join dates on lo_orderdate = d_datekey
join customer on lo_custkey = c_custkey
join supplier on lo_suppkey = s_suppkey
join part on lo_partkey = p_partkey
where c_region = 'AMERICA'and s_nation = 'UNITED STATES'
and (d_year = 1997 or d_year = 1998)
and p_category = 'MFGR#14'
group by d_year, s_city, p_brand
order by d_year, s_city, p_brand;

800 rows in set (1.56 sec)
800 rows in set (2.14 sec)

ClickHouse

构建数据

1
2
3
4
5
6
7
8
9
$ git clone https://github.com/vadimtk/ssb-dbgen.git
$ cd ssb-dbgen
$ make

$ ./dbgen -s 100 -T c
$ ./dbgen -s 100 -T l
$ ./dbgen -s 100 -T p
$ ./dbgen -s 100 -T s
$ ./dbgen -s 100 -T d

建表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
CREATE DATABASE ssb on cluster ck_cluster;

CREATE TABLE ssb.customer_local on cluster ck_cluster
(
C_CUSTKEY UInt32,
C_NAME String,
C_ADDRESS String,
C_CITY LowCardinality(String),
C_NATION LowCardinality(String),
C_REGION LowCardinality(String),
C_PHONE String,
C_MKTSEGMENT LowCardinality(String)
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/ssb/tables/{layer}-{shard}/customer',
'{replica}'
) ORDER BY (C_CUSTKEY) SETTINGS index_granularity = 8192;

CREATE TABLE ssb.customer on cluster ck_cluster AS ssb.customer_local ENGINE = Distributed(
ck_cluster,
ssb,
customer_local,
rand()
);


CREATE TABLE ssb.lineorder_local on cluster ck_cluster
(
LO_ORDERKEY UInt32,
LO_LINENUMBER UInt8,
LO_CUSTKEY UInt32,
LO_PARTKEY UInt32,
LO_SUPPKEY UInt32,
LO_ORDERDATE Date,
LO_ORDERPRIORITY LowCardinality(String),
LO_SHIPPRIORITY UInt8,
LO_QUANTITY UInt8,
LO_EXTENDEDPRICE UInt32,
LO_ORDTOTALPRICE UInt32,
LO_DISCOUNT UInt8,
LO_REVENUE UInt32,
LO_SUPPLYCOST UInt32,
LO_TAX UInt8,
LO_COMMITDATE Date,
LO_SHIPMODE LowCardinality(String)
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/ssb/tables/{layer}-{shard}/lineorder',
'{replica}'
) PARTITION BY toYear(LO_ORDERDATE) ORDER BY (LO_ORDERDATE, LO_ORDERKEY)
SETTINGS index_granularity = 8192;

CREATE TABLE ssb.lineorder on cluster ck_cluster AS ssb.lineorder_local ENGINE = Distributed(
ck_cluster,
ssb,
lineorder_local,
rand()
);

CREATE TABLE ssb.part_local on cluster ck_cluster
(
P_PARTKEY UInt32,
P_NAME String,
P_MFGR LowCardinality(String),
P_CATEGORY LowCardinality(String),
P_BRAND LowCardinality(String),
P_COLOR LowCardinality(String),
P_TYPE LowCardinality(String),
P_SIZE UInt8,
P_CONTAINER LowCardinality(String)
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/ssb/tables/{layer}-{shard}/part',
'{replica}'
) ORDER BY P_PARTKEY SETTINGS index_granularity = 8192;

CREATE TABLE ssb.part on cluster ck_cluster AS ssb.part_local ENGINE = Distributed(
ck_cluster,
ssb,
part_local,
rand()
);


CREATE TABLE ssb.supplier_local on cluster ck_cluster
(
S_SUPPKEY UInt32,
S_NAME String,
S_ADDRESS String,
S_CITY LowCardinality(String),
S_NATION LowCardinality(String),
S_REGION LowCardinality(String),
S_PHONE String
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/ssb/tables/{layer}-{shard}/supplier',
'{replica}'
) ORDER BY S_SUPPKEY SETTINGS index_granularity = 8192;

CREATE TABLE ssb.supplier on cluster ck_cluster AS ssb.supplier_local ENGINE = Distributed(
ck_cluster,
ssb,
supplier_local,
rand()
);


CREATE TABLE ssb.dates_local on cluster ck_cluster
(
D_DATEKEY UInt32,
D_DATE String,
D_DAYOFWEEK String,
D_MONTH String,
D_YEAR UInt32,
D_YEARMONTHNUM UInt32,
D_YEARMONTH String,
D_DAYNUMINWEEK UInt32,
D_DAYNUMINMONTH UInt32,
D_DAYNUMINYEAR UInt32,
D_MONTHNUMINYEAR UInt32,
D_WEEKNUMINYEAR UInt32,
D_SELLINGSEASON String,
D_LASTDAYINWEEKFL UInt32,
D_LASTDAYINMONTHFL UInt32,
D_HOLIDAYFL UInt32,
D_WEEKDAYFL UInt32
)
ENGINE = ReplicatedMergeTree(
'/clickhouse/ssb/tables/{layer}-{shard}/dates',
'{replica}'
) ORDER BY D_DATEKEY SETTINGS index_granularity = 8192;

CREATE TABLE ssb.dates on cluster ck_cluster AS ssb.dates_local ENGINE = Distributed(
ck_cluster,
ssb,
dates_local,
rand()
);

导入数据

1
2
3
4
5
6
7
8
cd ssb-dbgen
clickhouse-client --database=ssb --query "INSERT INTO customer FORMAT CSV" < customer.tbl
clickhouse-client --database=ssb --query "INSERT INTO part FORMAT CSV" < part.tbl
clickhouse-client --database=ssb --query "INSERT INTO supplier FORMAT CSV" < supplier.tbl
clickhouse-client --database=ssb --query "INSERT INTO lineorder FORMAT CSV" < lineorder.tbl
clickhouse-client --database=ssb --query "INSERT INTO dates FORMAT CSV" < date.tbl

以上用时不到600s

Converting “star schema” to denormalized “flat schema”:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
在一个节点执行
set max_bytes_before_external_group_by=2000000000;
set max_memory_usage=4000000000;

CREATE TABLE lineorder_flat_tmp
ENGINE = MergeTree
PARTITION BY toYear(LO_ORDERDATE)
ORDER BY (LO_ORDERDATE, LO_ORDERKEY) AS
SELECT
l.LO_ORDERKEY AS LO_ORDERKEY,
l.LO_LINENUMBER AS LO_LINENUMBER,
l.LO_CUSTKEY AS LO_CUSTKEY,
l.LO_PARTKEY AS LO_PARTKEY,
l.LO_SUPPKEY AS LO_SUPPKEY,
l.LO_ORDERDATE AS LO_ORDERDATE,
l.LO_ORDERPRIORITY AS LO_ORDERPRIORITY,
l.LO_SHIPPRIORITY AS LO_SHIPPRIORITY,
l.LO_QUANTITY AS LO_QUANTITY,
l.LO_EXTENDEDPRICE AS LO_EXTENDEDPRICE,
l.LO_ORDTOTALPRICE AS LO_ORDTOTALPRICE,
l.LO_DISCOUNT AS LO_DISCOUNT,
l.LO_REVENUE AS LO_REVENUE,
l.LO_SUPPLYCOST AS LO_SUPPLYCOST,
l.LO_TAX AS LO_TAX,
l.LO_COMMITDATE AS LO_COMMITDATE,
l.LO_SHIPMODE AS LO_SHIPMODE,
c.C_NAME AS C_NAME,
c.C_ADDRESS AS C_ADDRESS,
c.C_CITY AS C_CITY,
c.C_NATION AS C_NATION,
c.C_REGION AS C_REGION,
c.C_PHONE AS C_PHONE,
c.C_MKTSEGMENT AS C_MKTSEGMENT,
s.S_NAME AS S_NAME,
s.S_ADDRESS AS S_ADDRESS,
s.S_CITY AS S_CITY,
s.S_NATION AS S_NATION,
s.S_REGION AS S_REGION,
s.S_PHONE AS S_PHONE,
p.P_NAME AS P_NAME,
p.P_MFGR AS P_MFGR,
p.P_CATEGORY AS P_CATEGORY,
p.P_BRAND AS P_BRAND,
p.P_COLOR AS P_COLOR,
p.P_TYPE AS P_TYPE,
p.P_SIZE AS P_SIZE,
p.P_CONTAINER AS P_CONTAINER
FROM lineorder AS l
INNER JOIN customer AS c ON c.C_CUSTKEY = l.LO_CUSTKEY
INNER JOIN supplier AS s ON s.S_SUPPKEY = l.LO_SUPPKEY
INNER JOIN part AS p ON p.P_PARTKEY = l.LO_PARTKEY;

0 rows in set. Elapsed: 2086.025 sec. Processed 604.64 million rows, 26.12 GB (289.85 thousand rows/s., 12.52 MB/s.)

bj2-all-clickhouse-test-01 :) select count(*) from lineorder_flat;

SELECT count(*)
FROM lineorder_flat

┌───count()─┐
600037902
└───────────┘

1 rows in set. Elapsed: 0.006 sec.


CREATE TABLE ssb.lineorder_flat_local on cluster ck_cluster
AS ssb.lineorder_flat_tmp
ENGINE = ReplicatedMergeTree(
'/clickhouse/ssb/tables/{layer}-{shard}/lineorder_flat',
'{replica}'
)
PARTITION BY toYear(LO_ORDERDATE)
ORDER BY (LO_ORDERDATE, LO_ORDERKEY)


CREATE TABLE ssb.lineorder_flat on cluster ck_cluster AS ssb.lineorder_flat_local ENGINE = Distributed(
ck_cluster,
ssb,
lineorder_flat_local,
rand()
);

在三个分片执行:

INSERT INTO ssb.lineorder_flat SELECT * from ssb_local.lineorder_flat_tmp;

Ok.

0 rows in set. Elapsed: 747.548 sec. Processed 600.04 million rows, 140.40 GB (802.67 thousand rows/s., 187.82 MB/s.)

clickhouse数据导入用时总计约3500秒

最终数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
bj2-all-clickhouse-test-01 :) select count(*) from customer       ;

SELECT count(*)
FROM customer

┌─count()─┐
3000000
└─────────┘

1 rows in set. Elapsed: 0.004 sec.

bj2-all-clickhouse-test-01 :) select count(*) from dates ;

SELECT count(*)
FROM dates

┌─count()─┐
2556
└─────────┘

1 rows in set. Elapsed: 0.004 sec.

bj2-all-clickhouse-test-01 :) select count(*) from lineorder ;

SELECT count(*)
FROM lineorder

┌───count()─┐
600037902
└───────────┘

1 rows in set. Elapsed: 0.003 sec.

bj2-all-clickhouse-test-01 :) select count(*) from lineorder_flat ;

SELECT count(*)
FROM lineorder_flat

┌───count()─┐
622259902
└───────────┘

1 rows in set. Elapsed: 0.003 sec.

bj2-all-clickhouse-test-01 :) select count(*) from part ;

SELECT count(*)
FROM part

┌─count()─┐
1400000
└─────────┘

1 rows in set. Elapsed: 0.003 sec.

bj2-all-clickhouse-test-01 :) select count(*) from supplier ;

SELECT count(*)
FROM supplier

┌─count()─┐
200000
└─────────┘

1 rows in set. Elapsed: 0.004 sec.

数据占用空间

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[root@bj2-all-clickhouse-test-01 ssb]# ll
total 48
lrwxrwxrwx 1 root root 69 Feb 9 10:51 customer -> /data/clickhouse/node1/store/a61/a61a9f88-8bbb-4864-bd0a-1001f5ffcc1c
lrwxrwxrwx 1 root root 69 Feb 9 10:51 customer_local -> /data/clickhouse/node1/store/c86/c86c8026-f804-4f93-9a2b-051d0fbc1cc9
lrwxrwxrwx 1 root root 69 Feb 9 12:55 dates -> /data/clickhouse/node1/store/95d/95db2800-91b5-44f3-b378-727bc80d25bc
lrwxrwxrwx 1 root root 69 Feb 9 12:55 dates_local -> /data/clickhouse/node1/store/137/1371f45a-88bc-4d14-9f04-07895fd561ba
lrwxrwxrwx 1 root root 69 Feb 9 11:33 lineorder -> /data/clickhouse/node1/store/8f4/8f40504f-4e94-4d70-a1de-dbcb6d25d616
lrwxrwxrwx 1 root root 69 Feb 9 11:54 lineorder_flat -> /data/clickhouse/node1/store/9fd/9fdb7d2c-d2c2-49a0-ad7e-380fc76dff73
lrwxrwxrwx 1 root root 69 Feb 9 11:49 lineorder_flat_local -> /data/clickhouse/node1/store/883/883a4a97-f34b-491b-a305-398bd717cfb9
lrwxrwxrwx 1 root root 69 Feb 9 10:52 lineorder_local -> /data/clickhouse/node1/store/2d6/2d67810e-dfbc-438d-b7aa-6cbfee4c391f
lrwxrwxrwx 1 root root 69 Feb 9 11:36 part -> /data/clickhouse/node1/store/45b/45b4a779-a2df-4048-a266-efc1481fce68
lrwxrwxrwx 1 root root 69 Feb 9 10:53 part_local -> /data/clickhouse/node1/store/83f/83f3ac5d-296b-424f-900e-f13d67d044ae
lrwxrwxrwx 1 root root 69 Feb 9 11:36 supplier -> /data/clickhouse/node1/store/6a0/6a0b538a-8f79-4d24-bae0-82792fe8af19
lrwxrwxrwx 1 root root 69 Feb 9 10:54 supplier_local -> /data/clickhouse/node1/store/5e2/5e29e02d-e35b-4660-a5b9-1bfa1d9fb97d

[root@bj2-all-clickhouse-test-01 ssb]# ll |awk -F'->' '{print $2}'|xargs du -sh
20K /data/clickhouse/node1/store/a61/a61a9f88-8bbb-4864-bd0a-1001f5ffcc1c
39M /data/clickhouse/node1/store/c86/c86c8026-f804-4f93-9a2b-051d0fbc1cc9
20K /data/clickhouse/node1/store/95d/95db2800-91b5-44f3-b378-727bc80d25bc
64K /data/clickhouse/node1/store/137/1371f45a-88bc-4d14-9f04-07895fd561ba
20K /data/clickhouse/node1/store/8f4/8f40504f-4e94-4d70-a1de-dbcb6d25d616
20K /data/clickhouse/node1/store/9fd/9fdb7d2c-d2c2-49a0-ad7e-380fc76dff73
22G /data/clickhouse/node1/store/883/883a4a97-f34b-491b-a305-398bd717cfb9
6.4G /data/clickhouse/node1/store/2d6/2d67810e-dfbc-438d-b7aa-6cbfee4c391f
20K /data/clickhouse/node1/store/45b/45b4a779-a2df-4048-a266-efc1481fce68
8.3M /data/clickhouse/node1/store/83f/83f3ac5d-296b-424f-900e-f13d67d044ae
20K /data/clickhouse/node1/store/6a0/6a0b538a-8f79-4d24-bae0-82792fe8af19
2.6M /data/clickhouse/node1/store/5e2/5e29e02d-e35b-4660-a5b9-1bfa1d9fb97d


28.4G*3=85.2G

单表查询测试

执行

1
2
3
4
5
6
7
8
9
set max_threads=4;

SELECT *
FROM system.settings
WHERE name = 'max_threads'

┌─name────────┬─value─┬─changed─┬─description───────────────────────────────────────────────────────────────────────────────────────┬─min──┬─max──┬─readonly─┬─type───────┐
│ max_threads │ 41 │ The maximum number of threads to execute the request. By default, it is determined automatically. │ ᴺᵁᴸᴸ │ ᴺᵁᴸᴸ │ 0 │ MaxThreads │
└─────────────┴───────┴─────────┴───────────────────────────────────────────────────────────────────────────────────────────────────┴──────┴──────┴──────────┴────────────┘

Q1.1

1
2
3
4
5
6
7
8
9
10
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE (toYear(LO_ORDERDATE) = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)

┌────────revenue─┐
46310268722641
└────────────────┘

1 rows in set. Elapsed: 0.226 sec. Processed 94.38 million rows, 755.01 MB (417.85 million rows/s., 3.34 GB/s.)
1 rows in set. Elapsed: 0.195 sec. Processed 94.38 million rows, 755.01 MB (484.98 million rows/s., 3.88 GB/s.)

Q1.2

1
2
3
4
5
6
7
8
9
10
11
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE (toYYYYMM(LO_ORDERDATE) = 199401) AND ((LO_DISCOUNT >= 4) AND (LO_DISCOUNT <= 6)) AND ((LO_QUANTITY >= 26) AND (LO_QUANTITY <= 35))

┌───────revenue─┐
9979491062883
└───────────────┘

1 rows in set. Elapsed: 0.034 sec. Processed 8.07 million rows, 64.55 MB (239.67 million rows/s., 1.92 GB/s.)
1 rows in set. Elapsed: 0.063 sec. Processed 8.07 million rows, 64.55 MB (127.89 million rows/s., 1.02 GB/s.)

Q1.3

1
2
3
4
5
6
7
8
9
10
SELECT sum(LO_EXTENDEDPRICE * LO_DISCOUNT) AS revenue
FROM lineorder_flat
WHERE (toISOWeek(LO_ORDERDATE) = 6) AND (toYear(LO_ORDERDATE) = 1994) AND ((LO_DISCOUNT >= 5) AND (LO_DISCOUNT <= 7)) AND ((LO_QUANTITY >= 26) AND (LO_QUANTITY <= 35))

┌───────revenue─┐
2709633167278
└───────────────┘

1 rows in set. Elapsed: 0.043 sec. Processed 2.03 million rows, 16.23 MB (46.78 million rows/s., 374.26 MB/s.)
1 rows in set. Elapsed: 0.022 sec. Processed 2.03 million rows, 16.23 MB (91.09 million rows/s., 728.68 MB/s.)

Q2.1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SELECT
sum(LO_REVENUE),
toYear(LO_ORDERDATE) AS year,
P_BRAND
FROM lineorder_flat
WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA'
GROUP BY
year,
P_BRAND
ORDER BY
year,
P_BRAND;

...

280 rows in set. Elapsed: 1.723 sec. Processed 622.26 million rows, 6.42 GB (361.17 million rows/s., 3.73 GB/s.)
280 rows in set. Elapsed: 1.791 sec. Processed 622.26 million rows, 6.42 GB (347.43 million rows/s., 3.59 GB/s.)

Q2.2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SELECT
sum(LO_REVENUE),
toYear(LO_ORDERDATE) AS year,
P_BRAND
FROM lineorder_flat
WHERE P_BRAND >= 'MFGR#2221' AND P_BRAND <= 'MFGR#2228' AND S_REGION = 'ASIA'
GROUP BY
year,
P_BRAND
ORDER BY
year,
P_BRAND;

...

56 rows in set. Elapsed: 1.463 sec. Processed 622.26 million rows, 5.80 GB (425.29 million rows/s., 3.96 GB/s.)
56 rows in set. Elapsed: 1.470 sec. Processed 622.26 million rows, 5.80 GB (423.20 million rows/s., 3.94 GB/s.)

Q2.3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
SELECT
sum(LO_REVENUE),
toYear(LO_ORDERDATE) AS year,
P_BRAND
FROM lineorder_flat
WHERE (P_BRAND = 'MFGR#2239') AND (S_REGION = 'EUROPE')
GROUP BY
year,
P_BRAND
ORDER BY
year ASC,
P_BRAND ASC

┌─sum(LO_REVENUE)─┬─year─┬─P_BRAND───┐
681530630271992 │ MFGR#2239 │
670093995981993 │ MFGR#2239 │
671856758131994 │ MFGR#2239 │
679881810651995 │ MFGR#2239 │
672655723031996 │ MFGR#2239 │
669221285231997 │ MFGR#2239 │
386306642451998 │ MFGR#2239 │
└─────────────────┴──────┴───────────┘

7 rows in set. Elapsed: 0.659 sec. Processed 622.26 million rows, 5.80 GB (943.74 million rows/s., 8.79 GB/s.)
7 rows in set. Elapsed: 1.337 sec. Processed 622.26 million rows, 5.80 GB (465.51 million rows/s., 4.34 GB/s.)

Q3.1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
SELECT
C_NATION,
S_NATION,
toYear(LO_ORDERDATE) AS year,
sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA' AND year >= 1992 AND year <= 1997
GROUP BY
C_NATION,
S_NATION,
year
ORDER BY
year ASC,
revenue DESC;

...

150 rows in set. Elapsed: 2.488 sec. Processed 566.91 million rows, 5.68 GB (227.84 million rows/s., 2.28 GB/s.)
150 rows in set. Elapsed: 1.254 sec. Processed 566.91 million rows, 5.68 GB (452.10 million rows/s., 4.53 GB/s.)

Q3.2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
SELECT
C_CITY,
S_CITY,
toYear(LO_ORDERDATE) AS year,
sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES' AND year >= 1992 AND year <= 1997
GROUP BY
C_CITY,
S_CITY,
year
ORDER BY
year ASC,
revenue DESC;

...

600 rows in set. Elapsed: 1.272 sec. Processed 566.91 million rows, 5.77 GB (445.80 million rows/s., 4.54 GB/s.)
600 rows in set. Elapsed: 0.966 sec. Processed 566.91 million rows, 5.77 GB (587.07 million rows/s., 5.98 GB/s.)

Q3.3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
SELECT
C_CITY,
S_CITY,
toYear(LO_ORDERDATE) AS year,
sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE ((C_CITY = 'UNITED KI1') OR (C_CITY = 'UNITED KI5')) AND ((S_CITY = 'UNITED KI1') OR (S_CITY = 'UNITED KI5')) AND (year >= 1992) AND (year <= 1997)
GROUP BY
C_CITY,
S_CITY,
year
ORDER BY
year ASC,
revenue DESC

...

24 rows in set. Elapsed: 0.979 sec. Processed 566.91 million rows, 4.63 GB (579.03 million rows/s., 4.73 GB/s.)
24 rows in set. Elapsed: 0.889 sec. Processed 566.91 million rows, 4.63 GB (637.83 million rows/s., 5.21 GB/s.)

Q3.4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
SELECT
C_CITY,
S_CITY,
toYear(LO_ORDERDATE) AS year,
sum(LO_REVENUE) AS revenue
FROM lineorder_flat
WHERE ((C_CITY = 'UNITED KI1') OR (C_CITY = 'UNITED KI5')) AND ((S_CITY = 'UNITED KI1') OR (S_CITY = 'UNITED KI5')) AND (toYYYYMM(LO_ORDERDATE) = 199712)
GROUP BY
C_CITY,
S_CITY,
year
ORDER BY
year ASC,
revenue DESC

┌─C_CITY─────┬─S_CITY─────┬─year─┬───revenue─┐
│ UNITED KI1 │ UNITED KI1 │ 1997495955110
│ UNITED KI5 │ UNITED KI5 │ 1997421714882
│ UNITED KI5 │ UNITED KI1 │ 1997387387637
│ UNITED KI1 │ UNITED KI5 │ 1997380708672
└────────────┴────────────┴──────┴───────────┘

4 rows in set. Elapsed: 0.036 sec. Processed 8.09 million rows, 66.07 MB (221.65 million rows/s., 1.81 GB/s.)
4 rows in set. Elapsed: 0.020 sec. Processed 8.09 million rows, 66.07 MB (402.92 million rows/s., 3.29 GB/s.)

Q4.1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
SELECT
toYear(LO_ORDERDATE) AS year,
C_NATION,
sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE (C_REGION = 'AMERICA') AND (S_REGION = 'AMERICA') AND ((P_MFGR = 'MFGR#1') OR (P_MFGR = 'MFGR#2'))
GROUP BY
year,
C_NATION
ORDER BY
year ASC,
C_NATION ASC

...

35 rows in set. Elapsed: 5.067 sec. Processed 622.26 million rows, 8.72 GB (122.80 million rows/s., 1.72 GB/s.)
35 rows in set. Elapsed: 2.791 sec. Processed 622.26 million rows, 8.72 GB (222.97 million rows/s., 3.12 GB/s.)

Q4.2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
SELECT
toYear(LO_ORDERDATE) AS year,
S_NATION,
P_CATEGORY,
sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE C_REGION = 'AMERICA' AND S_REGION = 'AMERICA' AND (year = 1997 OR year = 1998) AND (P_MFGR = 'MFGR#1' OR P_MFGR = 'MFGR#2')
GROUP BY
year,
S_NATION,
P_CATEGORY
ORDER BY
year ASC,
S_NATION ASC,
P_CATEGORY ASC;

...


100 rows in set. Elapsed: 0.804 sec. Processed 149.77 million rows, 2.25 GB (186.21 million rows/s., 2.80 GB/s.)
100 rows in set. Elapsed: 0.752 sec. Processed 149.77 million rows, 2.25 GB (199.13 million rows/s., 2.99 GB/s.)

Q4.3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
SELECT
toYear(LO_ORDERDATE) AS year,
S_CITY,
P_BRAND,
sum(LO_REVENUE - LO_SUPPLYCOST) AS profit
FROM lineorder_flat
WHERE S_NATION = 'UNITED STATES' AND (year = 1997 OR year = 1998) AND P_CATEGORY = 'MFGR#14'
GROUP BY
year,
S_CITY,
P_BRAND
ORDER BY
year ASC,
S_CITY ASC,
P_BRAND ASC;

...

800 rows in set. Elapsed: 0.561 sec. Processed 149.77 million rows, 2.32 GB (266.84 million rows/s., 4.13 GB/s.)
800 rows in set. Elapsed: 0.483 sec. Processed 149.77 million rows, 2.32 GB (309.80 million rows/s., 4.80 GB/s.)

多表关联测试

DorisDB vs Clickhouse SSB性能测试对比报告, 不知道是不是疏漏了, Doris在测试时设置了

1
set global  parallel_fragment_exec_instance_num  = 8;

但没有写有没有在clickhouse测试时设置SETTINGS max_threads=8;
由于我的环境只有四个4 cpu, 所以我这里设置为4.
通过以下命令可知, lineorder每个分片数据文件大概6.4G(压缩后), 我的环境内存只有8G, 不确定能否都缓存了

这里还需要注意两个问题:

  1. clickhouse分布式表相比单表关联更加不友好, JOIN要改为GLOBAL JOIN
  2. JOIN操作时一定要把数据量小的表放在右边(这会导致SQL改写更加麻烦)

    详见ClickHouse查询分布式表LEFT JOIN改RIGHT JOIN的大坑

所以我对DorisDB vs Clickhouse SSB性能测试对比报告中的多表关联测试SQL进行了微调

Q1.1

1
2
3
4
5
6
7
8
9
10
11
12
SELECT SUM(LO_REVENUE) AS REVENUE
FROM lineorder
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE (D_YEAR = 1993) AND ((LO_DISCOUNT >= 1) AND (LO_DISCOUNT <= 3)) AND (LO_QUANTITY < 25)
SETTINGS max_threads = 4

┌────────REVENUE─┐
21881848590256
└────────────────┘

1 rows in set. Elapsed: 1.496 sec. Processed 600.04 million rows, 4.80 GB (401.10 million rows/s., 3.21 GB/s.)
1 rows in set. Elapsed: 1.424 sec. Processed 600.04 million rows, 4.80 GB (421.33 million rows/s., 3.37 GB/s.)

Q1.2

1
2
3
4
5
6
7
8
9
10
11
12
SELECT SUM(LO_REVENUE) AS REVENUE
FROM lineorder
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE (D_YEARMONTHNUM = 199401) AND ((LO_DISCOUNT >= 4) AND (LO_DISCOUNT <= 6)) AND ((LO_QUANTITY >= 26) AND (LO_QUANTITY <= 35))
SETTINGS max_threads = 4

┌───────REVENUE─┐
1829705342413
└───────────────┘

1 rows in set. Elapsed: 1.366 sec. Processed 600.04 million rows, 4.80 GB (439.39 million rows/s., 3.52 GB/s.)
1 rows in set. Elapsed: 0.659 sec. Processed 600.04 million rows, 4.80 GB (910.87 million rows/s., 7.29 GB/s.)

Q1.3

1
2
3
4
5
6
7
8
9
10
11
12
SELECT SUM(LO_REVENUE) AS REVENUE
FROM lineorder
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE (D_WEEKNUMINYEAR = 6) AND (D_YEAR = 1994) AND ((LO_DISCOUNT >= 5) AND (LO_DISCOUNT <= 7)) AND ((LO_QUANTITY >= 26) AND (LO_QUANTITY <= 35))
SETTINGS max_threads = 4

┌──────REVENUE─┐
407995993835
└──────────────┘

1 rows in set. Elapsed: 0.678 sec. Processed 600.04 million rows, 4.80 GB (885.40 million rows/s., 7.08 GB/s.)
1 rows in set. Elapsed: 1.377 sec. Processed 600.04 million rows, 4.80 GB (435.84 million rows/s., 3.49 GB/s.)

Q2.1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT SUM(LO_REVENUE) AS LO_REVENUE, D_YEAR, P_BRAND
FROM lineorder
GLOBAL JOIN part ON LO_PARTKEY = P_PARTKEY
GLOBAL JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY),'(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE P_CATEGORY = 'MFGR#12' AND S_REGION = 'AMERICA'
GROUP BY D_YEAR, P_BRAND
ORDER BY D_YEAR, P_BRAND
SETTINGS max_threads = 4;

...

280 rows in set. Elapsed: 4.367 sec. Processed 601.81 million rows, 8.45 GB (137.81 million rows/s., 1.94 GB/s.)
280 rows in set. Elapsed: 2.667 sec. Processed 601.81 million rows, 8.45 GB (225.69 million rows/s., 3.17 GB/s.)

Q2.2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
SELECT SUM(LO_REVENUE) AS LO_REVENUE, D_YEAR, P_BRAND
FROM lineorder
GLOBAL JOIN part ON LO_PARTKEY = P_PARTKEY
GLOBAL JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY),'(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE P_BRAND BETWEEN 'MFGR#2221' AND 'MFGR#2228' AND S_REGION = 'ASIA'
GROUP BY D_YEAR, P_BRAND
ORDER BY D_YEAR, P_BRAND
SETTINGS max_threads = 4;

...

56 rows in set. Elapsed: 4.498 sec. Processed 601.67 million rows, 8.45 GB (133.78 million rows/s., 1.88 GB/s.)
56 rows in set. Elapsed: 1.554 sec. Processed 601.67 million rows, 8.45 GB (387.23 million rows/s., 5.44 GB/s.)

Q2.3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
SELECT
SUM(LO_REVENUE) AS LO_REVENUE,
D_YEAR,
P_BRAND
FROM lineorder
GLOBAL INNER JOIN part ON LO_PARTKEY = P_PARTKEY
GLOBAL INNER JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE (P_BRAND = 'MFGR#2239') AND (S_REGION = 'EUROPE')
GROUP BY
D_YEAR,
P_BRAND
ORDER BY
D_YEAR ASC,
P_BRAND ASC
SETTINGS max_threads = 4

┌──LO_REVENUE─┬─D_YEAR─┬─P_BRAND───┐
657515897231992 │ MFGR#2239 │
645328448011993 │ MFGR#2239 │
647225990021994 │ MFGR#2239 │
656164326831995 │ MFGR#2239 │
648028846861996 │ MFGR#2239 │
644855411651997 │ MFGR#2239 │
372765363611998 │ MFGR#2239 │
└─────────────┴────────┴───────────┘


7 rows in set. Elapsed: 2.569 sec. Processed 601.64 million rows, 8.45 GB (234.18 million rows/s., 3.29 GB/s.)
7 rows in set. Elapsed: 2.577 sec. Processed 601.64 million rows, 8.45 GB (233.47 million rows/s., 3.28 GB/s.)

Q3.1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SELECT C_NATION, S_NATION, D_YEAR, SUM(LO_REVENUE) AS LO_REVENUE
FROM lineorder
GLOBAL JOIN customer ON LO_CUSTKEY = C_CUSTKEY
GLOBAL JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY),'(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE C_REGION = 'ASIA' AND S_REGION = 'ASIA'AND D_YEAR >= 1992 AND D_YEAR <= 1997
GROUP BY C_NATION, S_NATION, D_YEAR
ORDER BY D_YEAR ASC, LO_REVENUE DESC
SETTINGS max_threads = 4;

...

150 rows in set. Elapsed: 10.190 sec. Processed 605.04 million rows, 8.66 GB (59.38 million rows/s., 850.20 MB/s.)

150 rows in set. Elapsed: 12.960 sec. Processed 605.04 million rows, 8.66 GB (46.69 million rows/s., 668.51 MB/s.)

Q3.2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
SELECT C_CITY, S_CITY, D_YEAR, SUM(LO_REVENUE) AS LO_REVENUE
FROM lineorder
GLOBAL JOIN customer ON LO_CUSTKEY = C_CUSTKEY
GLOBAL JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY),'(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE C_NATION = 'UNITED STATES' AND S_NATION = 'UNITED STATES'
AND D_YEAR >= 1992 AND D_YEAR <= 1997
GROUP BY C_CITY, S_CITY, D_YEAR
ORDER BY D_YEAR ASC, LO_REVENUE DESC
SETTINGS max_threads = 4;

...

600 rows in set. Elapsed: 5.926 sec. Processed 603.60 million rows, 8.66 GB (101.85 million rows/s., 1.46 GB/s.)
600 rows in set. Elapsed: 5.743 sec. Processed 603.60 million rows, 8.66 GB (105.10 million rows/s., 1.51 GB/s.)

Q3.3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SELECT
C_CITY,
S_CITY,
D_YEAR,
SUM(LO_REVENUE) AS LO_REVENUE
FROM lineorder
GLOBAL INNER JOIN customer ON LO_CUSTKEY = C_CUSTKEY
GLOBAL INNER JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE ((C_CITY = 'UNITED KI1') OR (C_CITY = 'UNITED KI5')) AND ((S_CITY = 'UNITED KI1') OR (S_CITY = 'UNITED KI5')) AND (D_YEAR >= 1992) AND (D_YEAR <= 1997)
GROUP BY
C_CITY,
S_CITY,
D_YEAR
ORDER BY
D_YEAR ASC,
LO_REVENUE DESC
SETTINGS max_threads = 4

...

24 rows in set. Elapsed: 3.445 sec. Processed 603.31 million rows, 8.65 GB (175.11 million rows/s., 2.51 GB/s.)
24 rows in set. Elapsed: 3.300 sec. Processed 603.31 million rows, 8.65 GB (182.79 million rows/s., 2.62 GB/s.)

Q3.4

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
SELECT
C_CITY,
S_CITY,
D_YEAR,
SUM(LO_REVENUE) AS LO_REVENUE
FROM lineorder
GLOBAL INNER JOIN customer ON LO_CUSTKEY = C_CUSTKEY
GLOBAL INNER JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE ((C_CITY = 'UNITED KI1') OR (C_CITY = 'UNITED KI5')) AND ((S_CITY = 'UNITED KI1') OR (S_CITY = 'UNITED KI5')) AND (D_YEARMONTH = 'Dec1997')
GROUP BY
C_CITY,
S_CITY,
D_YEAR
ORDER BY
D_YEAR ASC,
LO_REVENUE DESC
SETTINGS max_threads = 4

┌─C_CITY─────┬─S_CITY─────┬─D_YEAR─┬─LO_REVENUE─┐
│ UNITED KI1 │ UNITED KI1 │ 1997481119563
│ UNITED KI5 │ UNITED KI5 │ 1997386477033
│ UNITED KI5 │ UNITED KI1 │ 1997378048353
│ UNITED KI1 │ UNITED KI5 │ 1997366630529
└────────────┴────────────┴────────┴────────────┘


4 rows in set. Elapsed: 3.455 sec. Processed 603.31 million rows, 8.65 GB (174.63 million rows/s., 2.50 GB/s.)
4 rows in set. Elapsed: 3.330 sec. Processed 603.31 million rows, 8.65 GB (181.19 million rows/s., 2.60 GB/s.

Q4.1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SELECT
D_YEAR,
C_NATION,
SUM(LO_REVENUE) - SUM(LO_SUPPLYCOST) AS PROFIT
FROM lineorder
GLOBAL INNER JOIN customer ON LO_CUSTKEY = C_CUSTKEY
GLOBAL INNER JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL INNER JOIN part ON LO_PARTKEY = P_PARTKEY
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE (C_REGION = 'AMERICA') AND (S_REGION = 'AMERICA') AND ((P_MFGR = 'MFGR#1') OR (P_MFGR = 'MFGR#2'))
GROUP BY
D_YEAR,
C_NATION
ORDER BY
D_YEAR ASC,
C_NATION ASC
SETTINGS max_threads = 4

...

35 rows in set. Elapsed: 15.560 sec. Processed 606.44 million rows, 13.47 GB (38.97 million rows/s., 865.71 MB/s.)
35 rows in set. Elapsed: 9.494 sec. Processed 606.44 million rows, 13.47 GB (63.88 million rows/s., 1.42 GB/s.)

Q4.2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
SELECT
D_YEAR,
S_NATION,
P_CATEGORY,
SUM(LO_REVENUE) - SUM(LO_SUPPLYCOST) AS PROFIT
FROM lineorder
GLOBAL INNER JOIN customer ON LO_CUSTKEY = C_CUSTKEY
GLOBAL INNER JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL INNER JOIN part ON LO_PARTKEY = P_PARTKEY
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE (C_REGION = 'AMERICA') AND (S_REGION = 'AMERICA') AND ((D_YEAR = 1997) OR (D_YEAR = 1998)) AND ((P_MFGR = 'MFGR#1') OR (P_MFGR = 'MFGR#2'))
GROUP BY
D_YEAR,
S_NATION,
P_CATEGORY
ORDER BY
D_YEAR ASC,
S_NATION ASC,
P_CATEGORY ASC
SETTINGS max_threads = 4

...

100 rows in set. Elapsed: 16.109 sec. Processed 606.44 million rows, 13.47 GB (37.64 million rows/s., 836.01 MB/s.)
100 rows in set. Elapsed: 18.048 sec. Processed 606.44 million rows, 13.47 GB (33.60 million rows/s., 746.35 MB/s.)

Q4.3

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
SELECT
D_YEAR,
S_CITY,
P_BRAND,
SUM(LO_REVENUE) - SUM(LO_SUPPLYCOST) AS PROFIT
FROM lineorder
GLOBAL INNER JOIN customer ON LO_CUSTKEY = C_CUSTKEY
GLOBAL INNER JOIN supplier ON LO_SUPPKEY = S_SUPPKEY
GLOBAL INNER JOIN part ON LO_PARTKEY = P_PARTKEY
GLOBAL INNER JOIN dates ON LO_ORDERDATE = toDate(replaceRegexpAll(toString(D_DATEKEY), '(\\d{4})(\\d{2})(\\d{2})', '\\1-\\2-\\3'))
WHERE (C_REGION = 'AMERICA') AND (S_NATION = 'UNITED STATES') AND ((D_YEAR = 1997) OR (D_YEAR = 1998)) AND (P_CATEGORY = 'MFGR#14')
GROUP BY
D_YEAR,
S_CITY,
P_BRAND
ORDER BY
D_YEAR ASC,
S_CITY ASC,
P_BRAND ASC
SETTINGS max_threads = 4

...

800 rows in set. Elapsed: 15.685 sec. Processed 606.44 million rows, 13.47 GB (38.66 million rows/s., 858.94 MB/s.)
800 rows in set. Elapsed: 14.838 sec. Processed 606.44 million rows, 13.47 GB (40.87 million rows/s., 907.94 MB/s.)

Powered by Hexo and Hexo-theme-hiker

Copyright © 2013 - 2021 Fan() All Rights Reserved.

访客数 : | 访问量 :