How to speed up influxdb data write using Python on a raspberry pi

Question

I am working on a Raspberry pi based fast data logger. The idea is to use Pi to interface with lis3dh accelerometer over SPI as fast as we can and send the data to influxdb as time series. I am not sure if it is crazy, but I was planning to push the limit to 5Khz as sample frequency.

My first implementation is getting the data to influx fine using Python, but I'd like to improve the throughout.

Here is a portion of my code that sends data as recorded:

# Loop above that is keeping track of run time and other user control ^

# Setup the variables
lastSampleTime = time.time()
measurement = "accel"
location = "location1"
# May need to include additional location in future

# Run the periodic sampling
while (time.time() - lastSampleTime) < SampleTime:

# Get Values from the device in a list[x, y, z]
AccelData = GetReadingsNow(lis3dh)

# Obtain current time
timenow = time.time_ns()

# construct the data to be sent
data = [{
  "measurement": measurement,
  "tags": {
    "location": location,
  },
  "time": timenow,
  "fields": {
    "x": Accels[0],
    "y": Accels[1],
    "z": Accels[2]
  }
}]

# Send the JSON data to InfluxDB
client.write_points(data)

As per documentation on influx, to speed up the write is to use batch write to db. But here is my question.

Can I use two threads, where 1 thread is responsible for sampling and other to send data to influx?
If so, how will thread 2 know what has been sent already out of this changing list?
If I couldn't do threading, then what would be the solution to minimize the data lost both in sampling and in writing to influx?

Thank you for your help.

And you are using in-kernel driver for that sensor, are you? (The main recommendation: do not use Python or other slow languages to communicate with sensors, and do not use user space drivers for them in non-RTOSes) — 0andriy, Jun 22 '20 at 19:58

score 0 · Answer 1 · answered Jun 22 '20 at 12:39

Do you know where the current bottleneck is? Is it GetReadingsNow(lis3dh), network i/o, InfluxDB, combination, etc.? Is InfluxDB also running on the Pi?

It would be best to understand where the bottleneck is before switching to multithreading.

However, to answer your specific questions:

Yes, using one thread to read from the device and another to send the readings to InfluxDB would work and might solve the problem. Here's an answer with a simple example of using threads in Python.
Hopefully the example code in the answer linked above helps with this. Thread 1 would read values from the device and add them to a list (acting as a buffer). When the list has built up 5K (or some reasonable number) of values, it would put the into a queue. It would then create a new list and start storing values in that. Thread 2 would pull lists of values out of the queue and send them to InfluxDB. Since Thread 1 always creates a new list after adding the list to the queue, you never have to worry about figuring out what changed. Thread 2 simply takes a list of values from the queue and sends it.
I'm not sure I understand what data is being lost. If by lost you mean not reaching the full 5K sample rate, then you'll first have to do as suggested above, find the bottleneck, and fix it.

How to speed up influxdb data write using Python on a raspberry pi

1 Answers1