1

I want to get the column values from DataFrame, which consists of arrays. By using DataFrame.values, the returned dtype is object, what I want is float64.

a=pd.DataFrame({'vector':[np.array([1.1,2,3]),np.array([2.1,3,4])]})
print(a)

b=a['vector'].values
print(b.dtype)
print(b.shape)

c=np.array([i for i in  a['vector']])
print(c.dtype)
print(c.shape)

>>>             vector
>>> 0  [1.1, 2.0, 3.0]
>>> 1  [2.1, 3.0, 4.0]
>>> object
>>> (2,)
>>> float64
>>> (2, 3)

why b and c has different dtype?

c is what I want to get, but is there any better way to get the same result?

Hunger
  • 5,186
  • 5
  • 23
  • 29

2 Answers2

1

Convert the Series to list and then pass it to np.array i.e

np.array(a['vector'].tolist())

array([[ 1.1,  2. ,  3. ],
   [ 2.1,  3. ,  4. ]])
Bharath M Shetty
  • 30,075
  • 6
  • 57
  • 108
0

According to https://stackoverflow.com/a/33718947/2251785,

numpy.concatenate should works too.

d=np.concatenate(a['vector'].values).reshape(len(a),-1)

Still confused about why .values treats array as object...

Hunger
  • 5,186
  • 5
  • 23
  • 29