Converting numpy ndarray into pandas dataframe with column names and types

Question

Edit: As explained below in @floydian's comment, the problem was that calling a = np.array(a, dtype=d) creates an a double array which was causing the problem.

I am aware that this has been already asked multiple times, and in fact am looking at Creating a Pandas DataFrame with a numpy array containing multiple types answer right now. But I still seem to have a problem while converting. It must be something very simple that I am missing. I hope that someone can be so kind and point it out. Sample code below:

import numpy as np
import pandas as pd

a = np.array([[1, 2], [3, 4]])
d = [('x','float'), ('y','int')]
a = np.array(a, dtype=d)

# Try 1
df= pd.DataFrame(a)
# Result - ValueError: If using all scalar values, you must pass an index

# Try 2
i = [1,2]
df= pd.DataFrame(a, index=i)
# Result - Exception: Data must be 1-dimensional

I mean, yeah, but what does it look like? Is the first column int and the second float? — cs95, Mar 06 '18 at 19:44
oh right, yes that's the idea: the first column int and the second float — shiftyscales, Mar 06 '18 at 19:45

score 2 · Answer 1 · answered Mar 06 '18 at 20:15

2

I would define the array like this:

a = np.array([(1, 2), (3, 4)], dtype=[('x','float'), ('y', 'int')])
pd.DataFrame(a)

gets what you want.

answered Mar 06 '18 at 20:15

Floydian

380
1
14

Unfortunately in my original problem the array is defined in the way I did it above. – shiftyscales Mar 06 '18 at 20:21
1

`a = np.array(a, dtype=d)` creates two 2x2 arrays, `a['x']` with float as dtype applied to all values and `a['y']` with int as dtype applied to all values. I am guessing this is the reason why you're getting all those errors. – Floydian Mar 06 '18 at 20:43
That was indeed the reason. Thank you for pointing me out in the right direction. – shiftyscales Mar 07 '18 at 15:43

score 1 · Answer 2 · answered Mar 06 '18 at 21:16

1

One option to separate it after the fact could be e.g.

pd.DataFrame(a.astype("float32").T, columns=a.dtype.names).astype({k: v[0] for k, v in a.dtype.fields.items()})

Out[296]: 
     x  y
0  1.0  3
1  2.0  4

answered Mar 06 '18 at 21:16

erocoar

5,723
3
23
45

Converting numpy ndarray into pandas dataframe with column names and types

2 Answers2