Appending Rows to a Pandas DataFrame

Question

I'm trying to read an analog signal from a Measurement-Computing Personal Measurement Device (PMD-1208FS) and then write it to a file with a corresponding timestamp for each observation. I would like to append to this file once every second with a new observation.

The PyUniversalLibrary allows me to read from the device, but I'm stuck trying to figure out how to save the information into a dataframe. This example was helpful for reading data from the PMD, but it doesn't provide any data logging examples.

The example below gets close to solving this problem, but the df.append(pd.DataFrame() function is not providing me with the desired result. This function ends up appending the most recent dataframe to the bottom of the previously saved one, rather than just appending the new data. The result is a dataframe with many duplicate dataframes in sequence.

Here's my code:

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty lists and a dataframe to fill:
co = [] ## carbon monoxide concentration in ppm
data = [] ## raw analog output between 0-5V
times = [] ## timestamp
df = pd.DataFrame()


## Set filepath:
filename = "~/pmd_data.csv"

while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    data.append(EngUnits)
    times.append(datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S'))
    co.append(ppm)
    ## This line of code is not providing the desired result:
    df = df.append(pd.DataFrame({'co':ppm, 'volts':data, 'datetime':times})) 
    print(df)
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)

Current Output:

    co    datetime    volts
0    13.8    2017-05-03 15:57:19   1.38
1    13.8    2017-05-03 15:57:19   1.38    
2    13.9    2017-05-03 15:57:20   1.39
3    13.8    2017-05-03 15:57:19   1.38
4    13.9    2017-05-03 15:57:20   1.39
5    14.2    2017-05-03 15:57:21   1.42

Desired Output:

    co    datetime    volts
0    13.8    2017-05-03 15:57:19   1.38
1    13.9    2017-05-03 15:57:20   1.39
2    14.2    2017-05-03 15:57:21   1.42

score 2 · Answer 1 · answered Oct 25 '17 at 19:46

If you're just looking to append, then you don't need a counter with .loc. You can just change it to df.loc[len(df)] = row . This will always write a new row at the end of the DataFrame.

Updated code from piRSquared code here:

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty lists and a dataframe to fill:
df = pd.DataFrame(columns=['co', 'volts', 'datetime'])

## Set filepath:
filename = "~/pmd_data.csv"

while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    df.loc[len(df)] = pd.Series(dict(
            co=ppm, volts=EngUnits, datetime=ts
        ))
    ## This line of code is not providing the desired result:
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)

plasmon360 · Answer 2 · 2017-05-04T00:18:52.287

you are appending a dataframe with lists (that grow with time) for each field every time it enters a while loop. But you should be adding a dataframe with a list with only one element for each field at a time. please see example below

you are essentially doing this:

co = [] ## carbon monoxide concentration in ppm
data = [] ## raw analog output between 0-5V
times = [] ## timestamp

df = pd.DataFrame()
for i in range(0,5):
    data.append(i)
    times.append(i)
    co.append(i)
    df = df.append(pd.DataFrame({'co':co, 'volts':data, 'datetime':times}))
print df

which results in

   co  datetime  volts
0   0         0      0
0   0         0      0
1   1         1      1
0   0         0      0
1   1         1      1
2   2         2      2
0   0         0      0
1   1         1      1
2   2         2      2
3   3         3      3
0   0         0      0
1   1         1      1
2   2         2      2
3   3         3      3
4   4         4      4

but you should be doing this

df = pd.DataFrame()
for i in range(0,5):
    df = df.append(pd.DataFrame({'co':[i], 'volts':[i], 'datetime':[i]}))
print df

which results in

   co  datetime  volts
0   0         0      0
0   1         1      1
0   2         2      2
0   3         3      3
0   4         4      4

so your code should like

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty dataframe to fill:
df = pd.DataFrame()

## Set filepath:
filename = "~/pmd_data.csv"

while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    times = (datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d %H:%M:%S'))
    df = df.append(pd.DataFrame({'co':[ppm], 'volts':[EngUnits], 'datetime':[times]})) 
    print(df)
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)

Thanks for your input. I really need the while loop to work for this application, as I don't have an idea of my data's range ahead of time. — philiporlando, May 03 '17 at 23:58
Glad to help. I just used for loop as a demonstration, you can still use your while loop. see my latest edit. — plasmon360, May 04 '17 at 00:19

score 1 · Accepted Answer · answered May 03 '17 at 23:35

Since you don't use the index specifically, I'd keep a counter and use it to add a new row to an existing dataframe.

I'd rewrite the while loop like this

## Source libraries:
from __future__ import print_function
import UniversalLibrary as UL
import time, os, io, csv, datetime
import pandas as pd

## Specify PMD settings:
BoardNum = 0
Gain = UL.BIP5VOLTS
Chan = 0

## Create empty lists and a dataframe to fill:
df = pd.DataFrame(columns=['co', 'volts', 'datetime'])

## Set filepath:
filename = "~/pmd_data.csv"

counter = 0
while True:
    ts = time.time()
    DataValue = UL.cbAIn(BoardNum, Chan, Gain)
    EngUnits = UL.cbToEngUnits(BoardNum, Gain, DataValue)
    ppm = EngUnits * 10 ## 1 Volt = 10ppm of carbon monoxide
    df.loc[counter] = pd.Series(dict(
            co=ppm, volts=EngUnits, datetime=ts
        ))
    ## This line of code is not providing the desired result:
    counter += 1
    df.to_csv(filename, sep = ',', index = False, encoding = 'utf-8')
    time.sleep(1)

This did the trick! Thank you so much for your help! I am new to python and this taught me how to use `columns` and `df.loc[counter]` in a way that I won't forget. Do you have any quick recommendations as to the best way to print the output to the console? I'm using `print(df.loc[counter])` at the moment, but the output stacks my columns in long format instead of the desired wide format. I'm now able to acquire data from a +20 year old air quality instrument! Thanks again! — philiporlando, May 03 '17 at 23:56
@spacedSparking Glad I can add life to that instrument. Try `print(df.loc[[counter]])` — piRSquared, May 03 '17 at 23:58

Appending Rows to a Pandas DataFrame

Here's my code:

Current Output:

Desired Output:

3 Answers3