3

I need help. I'm trying to filter out and write to another csv file that consists of data which collected after 10769s in the column elapsed_seconds together with the acceleration magnitude. However, I'm getting KeyError: 0...

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv(accelDataPath)
data.columns = ['t', 'x', 'y', 'z']

# calculate the magnitude of acceleration 
data['m'] = np.sqrt(data['x']**2 + data['y']**2 + data['z']**2)

data['datetime'] = pd.DatetimeIndex(pd.to_datetime(data['t'], unit = 'ms').dt.tz_localize('UTC').dt.tz_convert('US/Eastern'))
data['elapsed_seconds'] = (data['datetime'] -  data['datetime'].iloc[0]).dt.total_seconds()
i=0
csv = open("filteredData.csv", "w+")
csv.write("Event at, Magnitude \n")
while (i < len(data[data.elapsed_seconds > 10769])):
   csv.write(str(data[data.elapsed_seconds > 10769][i]) + ", " + str(data[data.m][i]) + "\n")
csv.close()

Error that I am getting is:

Traceback (most recent call last):
  File "C:\Users\Desktop\AnalyzingData.py", line 37, in <module>
csv.write(str(data[data.elapsed_seconds > 10769][i]) + ", " + str(data[data.m][i]) + "\n")
  File "C:\python\lib\site-packages\pandas\core\frame.py", line 1964, in __getitem__
    return self._getitem_column(key)
  File "C:\python\lib\site-packages\pandas\core\frame.py", line 1971, in _getitem_column
    return self._get_item_cache(key)
  File "C:\python\lib\site-packages\pandas\core\generic.py", line 1645, in _get_item_cache
    values = self._data.get(item)
  File "C:\python\lib\site-packages\pandas\core\internals.py", line 3590, in get
    loc = self.items.get_loc(item)
  File "C:\python\lib\site-packages\pandas\core\indexes\base.py", line 2444, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5280)
      File "pandas\_libs\index.pyx", line 154, in pandas._libs.index.IndexEngine.get_loc (pandas\_libs\index.c:5126)
      File "pandas\_libs\hashtable_class_helper.pxi", line 1210, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20523)
      File "pandas\_libs\hashtable_class_helper.pxi", line 1218, in pandas._libs.hashtable.PyObjectHashTable.get_item (pandas\_libs\hashtable.c:20477)
    KeyError: 0
Max Bethke
  • 286
  • 1
  • 2
  • 18
user2995019
  • 53
  • 1
  • 1
  • 9
  • 1
    It's quite obvious that your dataframe does not have a column called `0`. Yet `data[data.elapsed_seconds > 10769][i]` selects column `0` when `i == 0`. Also, it looks like you're writing an infinite loop... – IanS Oct 05 '17 at 09:54
  • 1
    Thank you for the clarification. May I know how do I initialize such that the first line of data[data.elapsed_seconds > 10769] is the first line? – user2995019 Oct 05 '17 at 10:04
  • 1
    Got it. Try `data[data.elapsed_seconds > 10769].iloc[i]`, this selects the first row. – IanS Oct 05 '17 at 10:07
  • 1
    Possible duplicate of [Pandas - Get first row value of a given column](https://stackoverflow.com/questions/25254016/pandas-get-first-row-value-of-a-given-column) – IanS Oct 05 '17 at 10:08

2 Answers2

1

change this line

csv.write(
    str(data[data.elapsed_seconds > 10769][i]) + ", " + str(data[data.m][i]) + "\n"
    )

To this:

csv.write(
   str(data[data.elapsed_seconds > 10769].iloc[i]) + ", " + str(data[data.m].iloc[i]) +"\n"
   )

Also, notice that you are not increasing i, like this i += 1, in the while loop.


Or, better, use df.to_csv as follows:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = pd.read_csv(accelDataPath)
data.columns = ['t', 'x', 'y', 'z']

# calculate the magnitude of acceleration 
data['m'] = np.sqrt(data['x']**2 + data['y']**2 + data['z']**2)

data['datetime'] = pd.DatetimeIndex(pd.to_datetime(data['t'], unit = 'ms').dt.tz_localize('UTC').dt.tz_convert('US/Eastern'))
data['elapsed_seconds'] = (data['datetime'] -  data['datetime'].iloc[0]).dt.total_seconds()

# write to csv using data.to_csv 
data[data.elapsed_seconds > 10769][['elapsed_seconds', 'm']].to_csv("filteredData.csv", 
            sep=",", 
            index=False)
Mohamed Ali JAMAOUI
  • 14,275
  • 14
  • 73
  • 117
  • It doesn't work... It produce another set of error... raise KeyError('%s not in index' % objarr[mask]) KeyError: '[ 0.20489437 0.21313549 0.22022774 ..., 1.53666405 1.31770629\n 1.55170659] not in index' – user2995019 Oct 05 '17 at 10:54
  • Index([u't', u'x', u'y', u'z', u'm', u'datetime', u'elapsed_seconds'], dtype='object') – user2995019 Oct 05 '17 at 11:11
0

I had the same issue, which was resolved by following this recommendation. In short, instead of df[0], do an explicit df['columnname']

Tms91
  • 3,456
  • 6
  • 40
  • 74
Brijesh
  • 776
  • 5
  • 9