Appending data into pandas dataframe

Question

I'm building a system where raspberry pi receives data via bluetooth and parses it into pandas dataframe for further processing. However, there are a few issues. The bluetooth packets are converted into a pandas Series object which I attempted to append into the empty dataframe unsuccesfully. Splitting below is performed in order to extract telemetry from a bluetooth packet.

Code creates a suitable dataframe with correct column names, but when I append into it, the Series object's row numbers become new columns. Each appended series is a single row in the final dataframe. What I want to know is: How do I add Series object into the dataframe so that values are put into columns with indices from 0 to 6 instead of from 7 to 14?

Edit: Added a screenshot with, output on the top, multiple of pkt below.

Edit2: Added full code per request. Added error traceback.

import time
import sys
import subprocess
import pandas as pd
import numpy as np

class Scan:
    def __init__(self, count, columns):
        self.running = True
        self.count = count
        self.columns = columns

    def run(self):
        i_count = 0
        p_data = pd.DataFrame(columns=self.columns, dtype='str')

        while self.running:
            output = subprocess.check_output(["commands", "to", "follow.py"]).decode('utf-8')
            p_rows = output.split(";")
            series_list = []
            print(len(self.columns))

            for packet in p_rows:
                pkt = pd.Series(packet.split(","),dtype='str', index=self.columns)
                pkt = pkt.replace('\n','',regex=True)
                print(len(pkt))
                series_list.append(pkt)
            p_data = pd.DataFrame(pd.concat(series_list, axis=1)).T

            print(p_data.head())
            print(p_rows[0])
            print(list(p_data.columns.values))

            if i_count  == self.count:
                self.running = False
                sys.exit()
            else:
                i_count += 1
            time.sleep(10)

def main():
    columns = ['mac', 'rssi', 'voltage', 'temperature', 'ad count', 't since boot', 'other']
    scan = Scan(0, columns)

while True:
    scan.run()

if __name__ == '__main__':
    main()

Traceback (most recent call last): File "blescanner.py", line 48, in main() File "blescanner.py", line 45, in main scan.run()

File "blescanner.py", line 24, in run pkt = pd.Series(packet.split(","),dtype='str', index=self.columns)

File "/mypythonpath/site-packages/pandas/core/series.py", line 262, in init .format(val=len(data), ind=len(index)))

ValueError: Length of passed values is 1, index implies 7

Possibly the most important bit is the structure of `output`. Can you show us what this looks like? — jpp, Aug 07 '18 at 13:33
Please don't add images / links, just text. See also [mcve]. — jpp, Aug 07 '18 at 13:59
In your second edit, are you sure your indentation is correct? As presented, your while loop is going to run to completion before iterating over `p_rows`. I assume that `for packet in p_rows` is supposed to be under your `while self.running` — dan_g, Aug 07 '18 at 15:12

dan_g · Accepted Answer · 2018-08-07T15:34:47.193

1

You don't want to append to a DataFrame in that way. What you can do instead is create a list of series, and concatenate them together.

So, something like this:

series_list = []
for packet in p_rows:
    pkt = pd.Series(packet.split(","),dtype='str')
    print(pkt)
    series_list.append(pkt)
p_data = pd.DataFrame(pd.concat(series_list), columns=self.columns, dtype='str')

As long as you don't specify ignore_index=True in the pd.concat call the index will not be reset (the default is ignore_index=False)

Edit:

It's not clear from your question, but if you're trying to add the series as new columns (instead of stack on top of each other), then change the last line from above to:

p_data = pd.concat(series_list, axis=1)
p_data.columns = self.columns

Edit2:

Still not entirely clear, but it sounds like (from your edit) that you want to transpose the series to be the rows, where the index of the series becomes your columns. I.e.:

series_list = []
for packet in p_rows:
    pkt = pd.Series(packet.split(","), dtype='str', index=self.columns)
    series_list.append(pkt)
p_data = pd.DataFrame(pd.concat(series_list, axis=1)).T

Edit 3: Based on your picture of output, when you split on ; the last element in your list is empty. E.g.:

output = """f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None;
            f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None;"""

output.split(';')

['f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None',
 '\n            f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None',
 '']

So instead of for packet in p_rows do for packet in p_rows[:-1]

Full example:

columns = ['mac', 'rssi', 'voltage', 'temperature', 'ad count', 't since boot', 'other']

output = """f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None;
            f1:07:ad:6b:97:c8,-24,2800,23.00,17962365,25509655,None;"""
p_rows = output.split(";")
series_list = []

for packet in p_rows[:-1]:
    pkt = pd.Series(packet.strip().split(","), dtype='str', index=columns)
    series_list.append(pkt)
p_data = pd.DataFrame(pd.concat(series_list, axis=1)).T

produces

                 mac rssi voltage temperature  ad count t since boot other
0  f1:07:ad:6b:97:c8  -24    2800       23.00  17962365     25509655  None
1  f1:07:ad:6b:97:c8  -24    2800       23.00  17962365     25509655  None

edited Aug 07 '18 at 15:34

answered Aug 07 '18 at 14:04

dan_g

2,712
5
25
44

Each series corresponds to a row in the dataframe, will edit accordingly. – Minregx Aug 07 '18 at 14:15
So, each series is meant to be a row of multiple columns? – dan_g Aug 07 '18 at 14:18
Yes, hence the splitting, each value between commas are sent to their own columns in the dataframe. – Minregx Aug 07 '18 at 14:25
1

OK, see edit #2. This should achieve what you want. You don't want to append to a DataFrame in a loop like you're doing, since each append copies the whole DataFrame, which becomes inefficient as the size of the DataFrame grows. – dan_g Aug 07 '18 at 14:30
The code in edit2 results in ValueError: Length of passed values is 1, index implies 7. – Minregx Aug 07 '18 at 14:37
What is the length of the each series produced by `packet.split(",")`. I was assuming that it would be the same length as your list of columns, if your goal is to map each value between the commas to a column. Can you post what a single series looks like? – dan_g Aug 07 '18 at 14:40
print(len(pkt)) results in a 7 – Minregx Aug 07 '18 at 14:42
And `len(self.columns)` is also 7? – dan_g Aug 07 '18 at 14:44
Can you post the code along with the full traceback? If they are the same shape then I don't see what would be producing that error. For example, this: `pd.Series('1,2'.split(','), dtype='str', index=['a', 'b'])` works as expected. – dan_g Aug 07 '18 at 14:49
@Minregx you are getting the `ValueError` because of the shape of `pkt` - it's one column and seven rows. Mitigate this by casting it to a dataframe and using the `T` function, and you can then append the data to your existing df. See edit above for working code example. – Joe Plumb Aug 07 '18 at 14:55
Again, you don't want to append to a df in a loop. If `pkt` is a series of length 7, passing a list of length 7 as the index should not produce a `ValueError`, which implies something else is wrong with the code. – dan_g Aug 07 '18 at 14:58
@Minregx see edit #3. If that doesn't fix it can you please provide an example of what `output` is as text (rather than a picture) – dan_g Aug 07 '18 at 15:30
1

Thanks a bunch, it works now! I suppose one small `strip()` in the middle glued it all together. – Minregx Aug 07 '18 at 15:58
No problem. Just FYI, the `strip()` just removed the newline white space. The problem was the last item in the list after splitting on `;` was just an empty string since (I'm assuming) the last character of the last line is `;`. So we ignore that last item when iterating over your rows by doing `p_rows[:-1]` – dan_g Aug 07 '18 at 16:05

Joe Plumb · Answer 2 · 2018-08-07T14:53:42.743

This is because of conflicting keys between the p_data df and pkt data in your append statement - you need to ensure that the keys in pkt match the column headings in the p_data dataframe you are appending to.

Fix this by either re-naming the columns in the p_data dataframe to the numbers you are seeing in the pkt, or by re-naming the keys in pkt before you append the data.

Edit: Following further discussion, agreed column names will not come into it as the incoming data is in the same order as the existing df. Simply wrap pd.DataFrame() around the pkt object and make sure the row of data is in the right shape when appending to achieve desired result.

import pandas as pd
import numpy as np

# Set initial df with data
d = pd.DataFrame(['f1:07:ad:6b:97:c8', '-23', '2900', '24.00', '17962371', '25509685', 'None']).T
p_data = pd.DataFrame(data=d, dtype='str')

# Parse new incoming data
output = "f1:07:ad:6b:97:c8;-24;2800;23.00;17962365;25509655;None"
pkt = output.split(";")

# Append new data to existing dataframe
p_data = p_data.append(pd.DataFrame(data=p_rows).T, ignore_index=True)

I thought of that, however, I get a typeerror saying can not append a Series unless ignore_index=True. — Minregx, Aug 07 '18 at 14:08
@Minregx Understood, I the the solution then will be to transform the `pkt` series into a df, and then append. You can do this inline (line 10 in your example) - see this answer for more details: https://stackoverflow.com/a/37909639/3446927 — Joe Plumb, Aug 07 '18 at 14:14

Appending data into pandas dataframe

2 Answers2