TickTockDB v.s. InfluxDB, max cardinality comparison in OrangePI zero2 - 1

ylin30 · Published in 2023-4-4 02:17:30

This post was finally edited by ylin30 at 2023-4-4 02:39

1. Introduction

In our previous wikis, we have evaluated how much cardinality that TickTockDB and InfluxDB can handle in RPI-zero-Wireless (ARMv6, 32bit OS). In this experiment, we want to see how they perform in another Single Board Computer (SBC), OrangePI-zero2.

In our tests, we let clients send data points to a TSDB periodically (one data point every 10 seconds per time series). We measure how many time serieses a TSDB can handle in max. We think it is close to normal scenarios in which there should be certain intervals between two consecutive operations from a client. For example, CPU data are collected once every 10 seconds.

2. IoTDB-benchmark Introduction

We select IoTDB-benchmark for performance evaluation. Please refer to README and the introduction in the previous wiki for details.

3. Experiment settings
3.1. Hardware

The figure shows a OrangePI-zero2, a Single Board Computer (SBC) with

Quad-core Cortex-A53 processor, 1.5GHz, ARMv8-A (64-bit),
1GB DDR3 memory,
802.11 b/g/n wireless LAN,
128GB v30 extreme SD card (SanDisk),
OS: Ubuntu 22.04.2 LTS (Jammy Jellyfish),
Cost: $35.99 in Amazon (not including the SD card)

We run IoTDB-benchmark on an Ubuntu laptop with 12 cores AMD Ryzen5 5600H cpu, 20GB of memory. We try to minimize network bottleneck by connecting the laptop with OrangePI-zero2 by a network cable directly. We assign static IPs to OrangePI-zero2 and the laptop by running, e.g., in OrangePI-zero2

sudo ip ad add 10.0.0.4/24 dev eth0

3.2. Software
TickTockDB

Version: 0.11.5
Config: tt.conf
Most configs are default except the followings. You can call config.sh to find out.

ylin30@raspberrypi:~/ticktock $ ./admin/config.sh
{ "tsdb.timestamp.resolution": "millisecond"}
Please update #openfile limits to a very high number. See this instruction.

For comparison purpose, we pick InfluxDB since it is the most popular TSDB.

Influxdb

Version: 1.8.10 (Note: we did actually try its latest version 2.6.1. However, to our surprised, v2.6.1 is even worse than v1.8.10 in terms of max-cardinality. We will present the comparison results later in the wiki.)
Config: default
We provide a wiki on how to install InfluxDB in Raspbian 11 here. It is also applicable to OrangePI.

IoTDB-benchmark

Version: main
Sample config:

for TickTockDB: config.properties
for InfluxDB: config.properties

Important settings in the config:

Read-Write ratio: reads(10%) and writes(90%).
Loop: 2160 and 10-seconds interval (which keeps each test running for 6 hours(=2160*10s))
Number of sensors per device: 10
We scale up loads by increasing the number of devices from 1k to 140k.
We bind each client to 100 devices. So we will update CLIENT_NUMBER and DEVICE_NUMBER in config.properties for each test.

The above configs will simulate a list of clients collecting a list of metrics (DEVICE_NUMBER * 10 sensors per device) every 10 seconds, and sending the metrics to TickTockDB/InfluxDB. Note that we use InfluxDB line write protocols in both TickTockDB and InfluxDB since the protocol is more concise than both OpenTSDB plain put protocol and InfluxDB v1 batch writes. Essentially, the line write protocol can send multiple data points in just one line, e.g., you can send cpu.usr, cpu.sys, and cpu.idle of cpu 1 in host rpi in one line.

cpu,host=rpi,id=1 usr=10,sys=20,idle=70 1465839830000

(To be continued)

ylin30 · Published in 2023-4-4 02:19:21

This post was finally edited by ylin30 at 2023-4-4 02:25

4. 100K cardinality: Resource consumption comparison

We first test that 100 clients send 100K data points (10K devices and 10 sensor data per device) at every 10 seconds. Each sensor's data is a unique time series, so the cardinality is 100k(=10k devices * 10 sensors/device). Note that it is not a backfill case in which data points are sent back to back right after a previous write is finished. The write throughput is fixed in our test setup. So it doesn't make sense to compare throughputs in both TickTockDB and InfluxDB. Instead, we compare how much OS resources TickTockDB and InfluxDB consumed at this load. The lower OS resources a TSDB consumes, the better the TSDB is.

4.1. CPU

The above figures show cpu.idle metric during tests. The higher, the better. Cpu.idle was 94% for TickTockDB and 10%-20% for InfluxDB, respectively. TickTockDB consumes much less CPU than InfluxDB.

4.2. IO Util

InfluxDB's IO util was about 80% while TickTockDB's IO util was almost negligible at this load.

4.3 Write and read bytes rate

TickTockDB's write bytes rate was between 40KB/sec and 100KB/sec, and InfluxDB's write byte rate 1.8MB to 2MB/sec, respectively. The final data size in TickTockDB data dir was 210MB and in InfluxDB data dir 127MB, respectively. At the first look, write byte rates and data dir size are contradict to one another. TickTockDB data directory should be larger. Please note that OS flushes data to disk page by page. If a page to be flushed is not full or contains unnecessary data, then write bytes would be larger than necessary. This indicates that TickTockDB's write IO is more efficient than that of the InfluxDB's, though InfluxDB data compression ratio is better.

Both TickTockDB and InfluxDB read bytes rate were very small.

4.4 Memory

RSS memory of TickTockDB grew up to 130MB. InfluxDB's RSS memory kept at 400MB to 450MB. TickTockDB consumes less RSS memory than InfluxDB.

(To be continued)

ylin30 · Published in 2023-4-4 02:20:23

This post was finally edited by ylin30 at 2023-4-4 02:28

5. Max cardinality: Resource consumption comparison

We would also like to know what is the max cardinality TickTockDB and InfluxDB can handle. So we increased client number (and device number correspondingly) gradually (5k, 10k, 12k, 100k, 140k devices) to see when TickTockDB/InfluxDB would start to saturate one of OS resources, or the whole test would take longer than 6 hours to finish (it means averagely operations can't finish within 10 seconds).

The following figures show all kinds of OS resources during the tests. TickTockDB and InfluxDB consumes more and more resources when cardinality is higher and higher, almost proportionally. Please refer to the following figures for details.

5.1. CPU

InfluxDB saturated CPU with 12k devices (i.e., 120K cardinality). CPU idle was almost 0.

TickTockDB consumes much less CPU than InfluxDB. With 100k devices (i.e., 1M cardinality), CPU idle was 40%-50%. With 140K devices (i.e., 1.4M cardinality), CPU idle was 10%-20%. There are still some small room left in CPU.

5.2. IO Util

We just saw above that InfluxDB saturated CPU with 12k devices (i.e., 120K cardinality). At that load, IO util was only about 50%. It was even smaller than IO util (80%) with 10k devices. This means that writes were not slower than the case of 10k devices. InfluxDB can't handle 12k devices (i.e., 120k cardinality). Actually the whole test took 22458.01 seconds (much larger than the planned 21600 seconds). So we concluded that the max cardinality InfluxDB can handle is 100K, at this experimental setup (including 10 sensors per device, sleep 10 seconds, 10% read vs 90% write etc).

TickTockDB consumes much less IO than InfluxDB at the same load. With 10k devices, IO util was almost negligible. With 100k devices (i.e., 1M cardinality), IO util was about 10%. With 140k devices (i.e., 1.4M cardinality) IO util was less than 30% mostly.

5.3 Write bytes rate

Write bytes rate patterns are very similar to IO util. To InfluxDB, write bytes rate with 12K devices is even smaller than 10k devices. This is because CPU was already saturated and it couldn't handle 12k devices.

To TickTockDB, write byte rates went up proportionally to device number. The max is less than 2.3MB/second which happened with 140k devices (1.4M cardinality).

Note that we used a v30 SanDisk SD card. We tested write and read bytes rate using dd. They are both 22.8MB/s and 22.6MB/s, respectively. So write bytes rate was still far from saturation.

ylin30@orangepizero2:~$ dd if=/dev/zero of=./test bs=512k count=4096

Copy code

5.4 Read bytes rate

We can see in the figure above that read byte rates went up while cardinality was increased. But read bytes rate was relatively small (less than 1MB/s).

5.5 Memory

RSS memory of both InfluxDB and TickTockDB went up proportionally to cardinality. InfluxDB used 550MB at its max cardinality (120K=12K devices * 10 sensors/device). There were still 450MB memory available. TickTockDB used 750MB at its max cardinality (1.4M = 140k devices * 10 sensors/device). There were still 250MB memory available.

5.6 Summary

InfluxDB saturated with 12k devices (i.e., 120K cardinality). CPU was saturated completely. IO util was only about 50%. It was even smaller than IO util (80%) at 10k devices. This means that writes were not slower than the case of 10k devices. InfluxDB can't handle 12k devices (i.e., 120k cardinality). Actually the whole test took 22458.01 seconds (much longer than the planned 21600 seconds). So we concluded that the max cardinality InfluxDB can handle is 100K, at this experimental setup (including 10 sensors per device, sleep 10 seconds, 10% read vs 90% write etc).

TickTockDB was close to saturation with 140k devices (i.e., 1.4M cardinality). CPU was also the bottleneck. Other resources (memory, IO, read/write rate) were still available. We consider TickTockDB max cardinality to be 1.4M.

(To be continued)

ylin30 · Published in 2023-4-4 02:20:59

This post was finally edited by ylin30 at 2023-4-4 02:29

6. InfluxDb 1.8.10 v.s. 2.6.1

Strangely InfluxDB v2.6.1 performs even worse than v1.8.10 in OrangePI-zero2. We list OS resource consumption figures of InfluxDB v2.6.1 below. You can see that IO util was already 100% with 5k devices.

Let's recall that InfluxDB v1.8.10's IO util with 5k devices was only 28% and with 10k devices only 80%, respectively. Please see the figure below.

InfluxDB v2.6.1 can't handle more than 5k devices (i.e., 50k cardinality), while we have shown that InfluxDB v1.8.10 can handle 10k devices (i.e., 100k cardinality). We don't have enough knowledge to explain the reason behind this (or some configs may need to be adjusted for v2.6.1 to perform better). It looks like InfluxDB v2.6.1 handles IO less efficiently than InfluxDB v1.8.10. OrangePI devices (including other SBC devices like RaspberryPI) use SD cards which have very limit IO capability. So InfluxDB v1.8.10 may be a better fit to OrangePI than InfluxDB v2.6.1. The situation may change in x86 servers with SSD/HDD drives.

7. Conclusion

We compared TickTockDB with InfluxDB in OrangePI-zero2 (ARMv8-A, 64bit OS) in terms of max cardinality. Instead of using backfill scenarios, we simulated normal scenarios that a list of clients send a list of time series (100 devices per client and 10 sensors per device) at every 10 seconds interval.
InfluxDB's max cardinality is 100K (i.e., 10k devices and 10 sensors/device).
TickTockDB's max cardinality is 1.4M (i.e, 140k devices and 10 sensors/device).
At the same cardinality load (10k devices), TickTockDB consumes much less OS resources than InfluxDB in CPU, IO, and memory.
In OrangePI, CPU is the bottleneck for both TickTockDB and InfluxDB. It was saturated earliest among all OS resources.
We happened to find out that InfluxDB v1.8.10 is better than v2.6.1 in OrangePI.