Why is np.shape not showing all dimensions?

Question

I have a pandas column storing a np array in each row. The df looks like this:

0    [38, 324, -21]
1    [41, 325, -19]
2    [41, 325, -19]
3    [42, 326, -20]
4    [42, 326, -19]

I want to convert this column into a np array so I can use it as training data for a model. I convert it to one np array with this:

arr = df.c.values

Now, I would except the shape of this array to be (5,3). However, when I run:

arr.shape

I get this:

(5,)

Further, if I run:

arr[0].shape

I get (3,).

Why don't I just get shape (5,3) when I run arr.shape?

It is object dtype., 5 separate arrays, not one 2d one. That's how they are stored in the frame. — hpaulj, Dec 08 '21 at 21:39
Since there's no guarantee all elements in the column have the same length, `.values` will not return a 2D array for you. However you can manually construct an array like @QuangHoang commented. — Psidom, Dec 08 '21 at 21:39
@QuangHoang Ahh, you are right. Forget `df.c.values` is already a numpy array. — Psidom, Dec 08 '21 at 21:43
I think this question is actually answered, and is about [how to convert a pandas table to a numpy array](https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array/). — D A, Dec 08 '21 at 23:04
Does this answer your question? [Convert pandas dataframe to NumPy array](https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array) — D A, Dec 08 '21 at 23:05
I think that the thing with OP here is that each row of colum `"c"` is a numpy array. So `df.c.values` nor df.c.to_numpy()` wont give the desired result. — Andre, Dec 09 '21 at 09:11

score 3 · Accepted Answer · answered Dec 09 '21 at 09:04

You can take a look at what df.c.values actually is by seeing what the output is:

import numpy as np
import pandas as pd

df = pd.DataFrame()
df['c'] = [np.random.randint(0, 10, 3) for i in range(5)]

In [2]: df
Out[2]:
    c
0   [-80, 4, -84]
1   [88, 32, 85]
2   [-11, 71, 37]
3   [-78, 93, 50]
4   [30, 29, 28]

In[3]: df.c.values
Out[3]: 
array([array([-80,   4, -84]), array([88, 32, 85]),
       array([-11,  71,  37]), array([-78,  93,  50]),
       array([30, 29, 28])], dtype=object)

So df.c.values is an 1 dimensional array containing 5 individual arrays (hence df.c.values.shape == (5,)), and not a 2d array.

To get a nd array you need to combine/stack them into one nd array. A straightforward way is to np.vstack() them:

arr = np.vstack(df.c.values)
arr.shape == (5,3)

Why is np.shape not showing all dimensions?

1 Answers1