I've got a .txt logfile with IMU sensor measurements which need to be parsed to a .CSV file. Accelerometer, gyroscope have 500Hz ODR (output data rate) magnetomer 100Hz, gps 1Hz and baro 1Hz. Wi-fi, BLE, pressure, light etc. is also logged but most is not needed. The smartphone App doesn't save all measurements sequentially.
It takes 1000+ seconds to parse a file of 200k+ lines to a pandas DataFrame sort the DataFrame on the timestamps and save it as a csv file.
When assigning values of sensor measurements at a coordinate (Row=Timestamp, column=sensor measurement) in the DataFrame, some need ~40% of the runtime, while others take +- 0.1% of the runtime.
What could be the reason for this? It shouldn't take a 1000+ seconds..
What is in the logfile:
ACCE;AppTimestamp(s);SensorTimestamp(s);Acc_X(m/s^2);Acc_Y(m/s^2);Acc_Z(m/s^2);Accuracy(integer)
GYRO;AppTimestamp(s);SensorTimestamp(s);Gyr_X(rad/s);Gyr_Y(rad/s);Gyr_Z(rad/s);Accuracy(integer)
MAGN;AppTimestamp(s);SensorTimestamp(s);Mag_X(uT);;Mag_Y(uT);Mag_Z(uT);Accuracy(integer)
MAGN;AppTimestamp(s);SensorTimestamp(s);Mag_X(uT);;Mag_Y(uT);Mag_Z(uT);Accuracy(integer)
PRES;AppTimestamp(s);SensorTimestamp(s);Pres(mbar);Accuracy(integer)
LIGH;AppTimestamp(s);SensorTimestamp(s);Light(lux);Accuracy(integer)
PROX;AppTimestamp(s);SensorTimestamp(s);prox(?);Accuracy(integer)
HUMI;AppTimestamp(s);SensorTimestamp(s);humi(Percentage);Accuracy(integer)
TEMP;AppTimestamp(s);SensorTimestamp(s);temp(Celsius);Accuracy(integer)
AHRS;AppTimestamp(s);SensorTimestamp(s);PitchX(deg);RollY(deg);YawZ(deg);RotVecX();RotVecY();RotVecZ();Accuracy(int)
GNSS;AppTimestamp(s);SensorTimeStamp(s);Latit(deg);Long(deg);Altitude(m);Bearing(deg);Accuracy(m);Speed(m/s);SatInView;SatInUse
WIFI;AppTimestamp(s);SensorTimeStamp(s);Name_SSID;MAC_BSSID;RSS(dBm);
BLUE;AppTimestamp(s);Name;MAC_Address;RSS(dBm);
BLE4;AppTimestamp(s);MajorID;MinorID;RSS(dBm);
SOUN;AppTimestamp(s);RMS;Pressure(Pa);SPL(dB);
RFID;AppTimestamp(s);ReaderNumber(int);TagID(int);RSS_A(dBm);RSS_B(dBm);
IMUX;AppTimestamp(s);SensorTimestamp(s);Counter;Acc_X(m/s^2);Acc_Y(m/s^2);Acc_Z(m/s^2);Gyr_X(rad/s);Gyr_Y(rad/s);Gyr_Z(rad/s);Mag_X(uT);;Mag_Y(uT);Mag_Z(uT);Roll(deg);Pitch(deg);Yaw(deg);Quat(1);Quat(2);Quat(3);Quat(4);Pressure(mbar);Temp(Celsius)
IMUL;AppTimestamp(s);SensorTimestamp(s);Counter;Acc_X(m/s^2);Acc_Y(m/s^2);Acc_Z(m/s^2);Gyr_X(rad/s);Gyr_Y(rad/s);Gyr_Z(rad/s);Mag_X(uT);;Mag_Y(uT);Mag_Z(uT);Roll(deg);Pitch(deg);Yaw(deg);Quat(1);Quat(2);Quat(3);Quat(4);Pressure(mbar);Temp(Celsius)
POSI;Timestamp(s);Counter;Latitude(degrees); Longitude(degrees);floor ID(0,1,2..4);Building ID(0,1,2..3)
A part of the RAW .txt logfile:
MAGN;1.249;343268.933;2.64000;-97.50000;-69.06000;0
GYRO;1.249;343268.934;0.02153;0.06943;0.09880;3
ACCE;1.249;343268.934;-0.24900;0.53871;9.59625;3 GNSS;1.250;1570711878.000;52.225976;5.174543;58.066;175.336;3.0;0.0;23;20
ACCE;1.253;343268.936;-0.26576;0.52674;9.58428;3
GYRO;1.253;343268.936;0.00809;0.06515;0.10002;3
ACCE;1.253;343268.938;-0.29450;0.49561;9.57710;3
GYRO;1.253;343268.938;0.00015;0.06088;0.10613;3
PRES;1.253;343268.929;1011.8713;3
GNSS;1.254;1570711878.000;52.225976;5.174543;58.066;175.336;3.0;0.0;23;20
ACCE;1.255;343268.940;-0.29450;0.49801;9.57710;3
GYRO;1.255;343268.940;-0.00596;0.05843;0.10979;3
ACCE;1.260;343268.942;-0.30647;0.50280;9.55795;3
GYRO;1.261;343268.942;-0.01818;0.05721;0.11529;3
MAGN;1.262;343268.943;2.94000;-97.74000;-68.88000;0
fileContent are the strings of the txt file as showed above.
Piece of the code:
def parseValues(line):
valArr = []
valArr = np.fromstring(line[5:], dtype=float, sep=";")
return (valArr)
i = 0
while i < len(fileContent):
if (fileContent[i][:4] == "ACCE"):
vals = parseValues(fileContent[i])
idx = vals[1] - initialSensTS
df.at[idx, 'ax'] = vals[2]
df.at[idx, 'ay'] = vals[3]
df.at[idx, 'az'] = vals[4]
df.at[idx, 'accStat'] = vals[5]
i += 1
The code works, but it's utterly slow at some of the df.at[idx, 'xx'] lines.
See Line # 28.
Line profiler output:
Line # Hits Time Per Hit % Time Line Contents
==============================================================
22 1 1.0 1.0 0.0 i = 0
23 232250 542594.0 2.3 0.0 while i < len(fileContent):
24 232249 294337000.0 1267.3 23.8 update_progress(i / len(fileContent))
25 232249 918442.0 4.0 0.1 if (fileContent[i][:4] == "ACCE"):
26 54602 1584625.0 29.0 0.1 vals = parseValues(fileContent[i])
27 54602 316968.0 5.8 0.0 idx = vals[1] - initialSensTS
28 54602 504189480.0 9233.9 40.8 df.at[idx, 'ax'] = vals[2]
29 54602 8311109.0 152.2 0.7 df.at[idx, 'ay'] = vals[3]
30 54602 4901983.0 89.8 0.4 df.at[idx, 'az'] = vals[4]
31 54602 4428239.0 81.1 0.4 df.at[idx, 'accStat'] = vals[5]
32 54602 132590.0 2.4 0.0 i += 1