0

I have a dataframe with two columns:

  In[] df.head()

  Out[]      specific_death   months_survival
       0         False            179
       1         False            127
       2         False            67
       3         True             111
       4         False            118

The first column has booleans while the second has integers. If I convert the dataframe to a numpy ndarray with:

array_from_df = df.to_numpy()

I get an unstructured numpy.ndarray. Thus if I write:

array_from_df.dtype.fields 

The result is NoneType. For my program to work I need to have a structured array with the first field being a np.bool class and the second field a np.int. The way I see it there are two options but I couldn't find a way to do either:

Option one

Transform directly from a Pandas.DataFrame to a structured numpy.ndarray with the correct dtypes.

Option two

Transform from Pandas.DataFrame to an unstructured numpy.ndarray and then transform that to an structured numpy.ndarray. I found another SO question regarding this problem but I couldn't replicate the answer on my code.

  • 1
    Maybe [this method](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_records.html) should do what you want – Ralubrusto Oct 28 '20 at 17:41
  • 1
    `df.to_records(index=False)` should do the job, no? For more options, see [this answer](https://stackoverflow.com/a/54508052/565489) for example. – Asmus Oct 28 '20 at 17:49
  • Thank you both for the quick answer, it seemed odd to me that Pandas hadn't already implemented a one-liner to do this trick. – Carlos Hernandez Perez Oct 28 '20 at 22:30

1 Answers1

0

As both comments suggested:

array_from_df = df.to_records() # index=False to not include an index column

Outputs an a numpy.recarray with the correct data types in:

array_from_df.dtype.fields