3

I am using a pandas data frame to clean and process data. However, I need to then convert it into a numpy ndarray in order to use exploit matrix multiplication. I turn the data frame into a list of lists with the following:

x = df.tolist()

This returns the following structure:

[[1, 2], [3, 4], [5, 6], [7, 8] ...]

I then convert it into a numpy array like this:

x = np.array(x)

However, the following print:

print(type(x))
print(type(x[0]))

gives this result:

'numpy.ndarray'
'numpy.float64'

However, I need them both to be numpy arrays. If it's not from a pandas data frame and I just convert a hard-coded list of lists then they are both ndarrays. How do I get the list, and the lists in that list to be ndarrays when that list has been made from a data frame? Many thanks for reading, this has had me stumped for hours.

max89
  • 443
  • 5
  • 18

3 Answers3

4

I think you need values:

df = pd.DataFrame({'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0]})

print (df)
   C  D
0  7  1
1  8  3
2  9  5
3  4  7
4  2  1
5  3  0

x = df.values
print (x)
[[7 1]
 [8 3]
 [9 5]
 [4 7]
 [2 1]
 [3 0]]

And then select by indexing:

print (x[:,0])
[7 8 9 4 2 3]

print (x[:,1])
[1 3 5 7 1 0]

print (type(x[:,0]))
<class 'numpy.ndarray'>

Also is possible transpose array:

x = df.values.T
print (x)
[[7 8 9 4 2 3]
 [1 3 5 7 1 0]]

print (x[0])
[7 8 9 4 2 3]

print (x[1])
[1 3 5 7 1 0]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
3

How about as_matrix:

x = df.as_matrix()
zipa
  • 27,316
  • 6
  • 40
  • 58
  • Seems like `as_matrix` is deprecated as of pandas version 0.23.0, and that `values` should be use instead – ratiaris Aug 15 '18 at 21:40
0

You may want to try df.get_values(), and eventually np.reshape it.

nsaura
  • 306
  • 2
  • 11