pandas header does exist but still getting KeyError

Question

I am getting a Keyerror 'stackoverflow' when I run my code.

e0 = pd.read_csv(working_dir+"E0.txt",sep=',')
e0['MTM'] = e0['stack_over_flow']

I did output the columns of e0 and I do get stack_over_flow in my columns.

b'Super_user'
b'Personal_finance'
b'stack_over_flow'

I also tried removing the b manually from the .txt file and still get the same error. Can anyone help with this?

traceback:

Traceback (most recent call last):

  File "<ipython-input-74-99e71d524b4b>", line 1, in <module>
    runfile('C:/AppData/FinRecon/py_code/python3/DataJoin.py', wdir='C:/AppData/FinRecon/py_code/python3')

  File "C:\Users\stack\AppData\Local\Continuum\anaconda3\anaconda3_32bit\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "C:\Users\stack\AppData\Local\Continuum\anaconda3\anaconda3_32bit\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "C:/AppData/FinRecon/py_code/python3/DataJoin.py", line 474, in <module>
    M2()

  File "C:/AppData/FinRecon/py_code/python3/DataJoin.py", line 41, in M2
    e0['MTM'] = e0['stack_over_flow']

  File "C:\Users\stack\AppData\Local\Continuum\anaconda3\anaconda3_32bit\lib\site-packages\pandas\core\frame.py", line 2927, in __getitem__
    indexer = self.columns.get_loc(key)

  File "C:\Users\stack\AppData\Local\Continuum\anaconda3\anaconda3_32bit\lib\site-packages\pandas\core\indexes\base.py", line 2659, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc

  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc

  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'stack_over_flow'

Update, I figured it out it is the b and '' before and between each header. Why does this get added to my .txt file?

[Provide a copy of the data](https://stackoverflow.com/questions/52413246/how-do-i-provide-a-reproducible-copy-of-my-existing-dataframe) or the csv file. — Trenton McKinney, Aug 19 '19 at 16:53

ajayramesh · Accepted Answer · 2019-08-19T18:03:07.277

1

I changed your data, something like below Say data for E0.txt is like below.

stackoverflow,"some column name", test
1, 2, 3

Use below code to retreive the content of any column.

e0 = pd.read_csv(working_dir+"E0.txt",sep=',')
e0['MTM'] = e0['stack_over_flow']

-- update --

without b I created a test sample, it works for below input

Super_user,Personal_finance,stack_over_flow
1, 2, 3

edited Aug 19 '19 at 18:03

answered Aug 19 '19 at 16:54

ajayramesh

3,576
8
50
75

Im not sure what the change i have to make here. Are you just showing it is working for you? – excelguy Aug 19 '19 at 17:27
you need to change the content of the file to UTF or something simple with 'b' in that. – ajayramesh Aug 19 '19 at 17:36
tried this, `e0 = pd.read_csv(working_dir+"E0.txt",sep=',',encoding='utf-8')` – excelguy Aug 19 '19 at 17:37
why you need to use `b` in your text input. for me I remved all `b` then it worked. – ajayramesh Aug 19 '19 at 18:01
I think in a prevous process it adds the `b`, Im not sure why this was done or what the purpose was. But I did remove the b's and still got the same issue. – excelguy Aug 19 '19 at 18:08
ah i also removed the double quotes `''` between each column and it worked. However I am unsure why this is being added to my .txt files – excelguy Aug 19 '19 at 18:40
how the file is generated ? Some programme is writing it? what language is used? – ajayramesh Aug 19 '19 at 19:10

score 0 · Answer 2 · answered Aug 19 '19 at 16:49

0

Assuming the pandas dataframe is loaded, use the following method to select the columns,

df[['stack_over_flow']]

answered Aug 19 '19 at 16:49

Gokul Krishnan R

83
6

vtnate · Answer 3 · 2019-08-19T17:33:39.427

0

I suspect the problem lies in the encoding when reading the csv. Try adding encoding='utf-8' inside your read_csv call.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html

edited Aug 19 '19 at 17:33

answered Aug 19 '19 at 16:50

vtnate

133
1
9

same error after adding this, `e0 = pd.read_csv(working_dir+"E0.txt",sep=',',encoding='utf-8')` – excelguy Aug 19 '19 at 17:26

pandas header does exist but still getting KeyError

3 Answers3