Creating numpy array from pandas dataframe

Question

import pandas as pd
import numpy as np
df = pd.read_csv('~/test.txt')
list(df.columns.values)

I get the following output :

['time', 'Res_fs1', 'angle1', 'Res_fs2', 'angle2', 'Res_ps1', 'Force1', 
'Res_ps2', 'Force2', 'object']

when i try to create a numppy array using Res_fs1,Res_fs2,Res_ps1,Res_ps2

X=np.array(df['Res_fs1','Res_fs2','Res_ps1','Res_ps2'])

I get this error message saying key error although the keys exist:

Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1969, in 
__getitem__
return self._getitem_column(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1976, in 
_getitem_column
return self._get_item_cache(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1091, 
in _get_item_cache
values = self._data.get(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/internals.py", line 3211, 
in get
loc = self.items.get_loc(item)
File "/usr/lib/python2.7/dist-packages/pandas/core/index.py", line 1759, in 
get_loc
return self._engine.get_loc(key)
File "pandas/index.pyx", line 137, in pandas.index.IndexEngine.get_loc 
(pandas/index.c:3979)
File "pandas/index.pyx", line 157, in pandas.index.IndexEngine.get_loc 
(pandas/index.c:3843)
File "pandas/hashtable.pyx", line 668, in 
pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12265)
File "pandas/hashtable.pyx", line 676, in 
pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12216)
KeyError: ('Res_fs1', 'Res_fs2', 'Res_ps1', 'Res_ps2')

score 2 · Answer 1 · answered Jun 28 '17 at 01:33

2

You can just do:

X = df[['Res_fs1','Res_fs2','Res_ps1','Res_ps2']].values

When you subset columns, you need use double square brackets '[[' and ']]'

answered Jun 28 '17 at 01:33

Allen Qin

19,507
8
51
67

More specifically, the outer brackets are syntactic sugar for the getitem method and it takes a single argument. Without the inner brackets you are passing many arguments to that method. With the inner brackets you are passing a list which it knows how to handle – piRSquared Jun 28 '17 at 01:39
The `__getitem__` special method is not passed many arguments if there are no brackets. Instead, it's passed a tuple. See the key error – Ted Petrou Jun 28 '17 at 01:45

score 2 · Answer 2 · edited Jun 20 '20 at 09:12

2

pandas has an in-built function for this purpose: pandas.DataFrame.as_matrix

DataFrame.as_matrix(columns=None)

Convert the frame to its Numpy-array representation.

edited Jun 20 '20 at 09:12

Community

1
1

answered Jun 28 '17 at 07:08

techiegirl123

81
1
10

score 1 · Answer 3 · answered Jun 28 '17 at 02:06

To really understand what is happening you need to know how Python handles the indexing operator (the square brackets). Internally, the square brackets are special syntax for calling an object's __getitem__ special method. If the object does not implement the special method you will get an error how the object does not support indexing.

When you call df['Res_fs1','Res_fs2','Res_ps1','Res_ps2'], Python interprets the comma separated columns as a tuple. It sends the tuple to the __getitem__ special method of the DataFrame.

Internally, this is what gets called.

df.__getitem__(('Res_fs1','Res_fs2','Res_ps1','Res_ps2'))

Tuples are immutable objects and able to be hashed and therefore are candidates for members of an index. pandas attempts to find a column name that is the exact tuple ('Res_fs1','Res_fs2','Res_ps1','Res_ps2'). Since your DataFrame does not have this column a KeyError is raised.

When you call df[['Res_fs1','Res_fs2','Res_ps1','Res_ps2']], the __getitem__ special method is passed a list. Lists cannot be hashed and therefore unable to be members of the index. pandas therefore takes a completely different path and retrieves all column names that are in the passed list. It will raise a KeyError if one of the items in the list is not a column name.

Creating numpy array from pandas dataframe

3 Answers3