How do I convert a Pandas series or index to a NumPy array?

Question

How can I get the index or column of a DataFrame as a NumPy array or Python list?

Also, related: [Convert pandas dataframe to NumPy array](https://stackoverflow.com/a/54508052/4909087) — cs95, Feb 05 '19 at 05:49
Does this answer your question? [Convert pandas dataframe to NumPy array](https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array) — AMC, Jan 07 '20 at 19:45
**NOTE:** Having to convert Pandas DataFrame to an array (or list) like this can be indicative of other issues. I strongly recommend ensuring that a DataFrame is the appropriate data structure for your particular use case, and that Pandas does not include any way of performing the operations you're interested in. — AMC, Jan 07 '20 at 20:22
**Concerning my vote to reopen this question:** Technically, a pandas series is not the same as a pandas dataframe. The answers may be the same, but the questions are definitely different. — Serge Stroobandt, Aug 25 '21 at 09:51

score 378 · Accepted Answer · edited Sep 02 '22 at 23:28

378

To get a NumPy array, you should use the values attribute:

In [1]: df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c']); df
   A  B
a  1  4
b  2  5
c  3  6

In [2]: df.index.values
Out[2]: array(['a', 'b', 'c'], dtype=object)

This accesses how the data is already stored, so there isn't any need for a conversion.

Note: This attribute is also available for many other pandas objects.

In [3]: df['A'].values
Out[3]: Out[16]: array([1, 2, 3])

To get the index as a list, call tolist:

In [4]: df.index.tolist()
Out[4]: ['a', 'b', 'c']

And similarly, for columns.

edited Sep 02 '22 at 23:28

Peter Mortensen

30,738
21
105
131

answered Jun 21 '13 at 18:51

Andy Hayden

359,921
101
625
535

1

Note: `.values` is deprecated, `.to_numpy()` is the suggested replacement if you want a NumPy array. Can you expand on _This accesses how the data is already stored, so there's no need for a conversion_? – AMC Jan 09 '20 at 21:32
The [answer by cs95](https://stackoverflow.com/a/54324513/11301900) gives a great explanation of `.values`, `.to_numpy()` and `.array`. – AMC Jan 09 '20 at 21:49

score 75 · Answer 2 · edited Feb 03 '19 at 21:31

75

You can use df.index to access the index object and then get the values in a list using df.index.tolist(). Similarly, you can use df['col'].tolist() for Series.

edited Feb 03 '19 at 21:31

cs95

379,657
97
704
746

answered Jun 21 '13 at 17:32

bdiamante

15,980
6
40
46

It returns instanceMethod and not a list array – V Shreyas Jun 01 '16 at 10:45
12

@VShreyas ,how about `df.index.values.tolist()` – LancelotHolmes Mar 10 '17 at 02:06
3

`df.index.tolist()` doesn't return an instance method. It returns a list of indices. It is a method defined on pandas index. While calling values first is a possibility, delegating the job to numpy is not a correction - just an alternative. – ayhan May 20 '17 at 08:08

score 70 · Answer 3 · edited Sep 02 '22 at 11:19

pandas >= 0.24

Deprecate your usage of `.values` in favour of these methods!

From v0.24.0 onwards, we will have two brand spanking new, preferred methods for obtaining NumPy arrays from Index, Series, and DataFrame objects: they are to_numpy(), and .array. Regarding usage, the docs mention:

We haven’t removed or deprecated Series.values or DataFrame.values, but we highly recommend and using .array or .to_numpy() instead.

See this section of the v0.24.0 release notes for more information.

to_numpy() Method

df.index.to_numpy()
# array(['a', 'b'], dtype=object)

df['A'].to_numpy()
#  array([1, 4])

By default, a view is returned. Any modifications made will affect the original.

v = df.index.to_numpy()
v[0] = -1
 
df
    A  B
-1  1  2
b   4  5

If you need a copy instead, use to_numpy(copy=True);

v = df.index.to_numpy(copy=True)
v[-1] = -123
 
df
   A  B
a  1  2
b  4  5

Note that this function also works for DataFrames (while .array does not).

array Attribute
This attribute returns an ExtensionArray object that backs the Index/Series.

pd.__version__
# '0.24.0rc1'

# Setup.
df = pd.DataFrame([[1, 2], [4, 5]], columns=['A', 'B'], index=['a', 'b'])
df

   A  B
a  1  2
b  4  5

<!- ->

df.index.array    
# <PandasArray>
# ['a', 'b']
# Length: 2, dtype: object

df['A'].array
# <PandasArray>
# [1, 4]
# Length: 2, dtype: int64

From here, it is possible to get a list using list:

list(df.index.array)
# ['a', 'b']

list(df['A'].array)
# [1, 4]

or, just directly call .tolist():

df.index.tolist()
# ['a', 'b']

df['A'].tolist()
# [1, 4]

Regarding what is returned, the docs mention,

For Series and Indexes backed by normal NumPy arrays, Series.array will return a new arrays.PandasArray, which is a thin (no-copy) wrapper around a numpy.ndarray. arrays.PandasArray isn’t especially useful on its own, but it does provide the same interface as any extension array defined in pandas or by a third-party library.

So, to summarise, .array will return either

The existing ExtensionArray backing the Index/Series, or
If there is a NumPy array backing the series, a new ExtensionArray object is created as a thin wrapper over the underlying array.

Rationale for adding TWO new methods
These functions were added as a result of discussions under two GitHub issues GH19954 and GH23623.

Specifically, the docs mention the rationale:

[...] with .values it was unclear whether the returned value would be the actual array, some transformation of it, or one of pandas custom arrays (like Categorical). For example, with PeriodIndex, .values generates a new ndarray of period objects each time. [...]

These two functions aim to improve the consistency of the API, which is a major step in the right direction.

Lastly, .values will not be deprecated in the current version, but I expect this may happen at some point in the future, so I would urge users to migrate towards the newer API, as soon as you can.

`S = pd.Series( [3, 4] ); np.asarray( S ) is S.values` surprised me; would you know if this is documented anywhere ? (numpy 1.21.5, pandas 1.3.5) — denis, Jan 23 '22 at 14:28

score 49 · Answer 4 · answered Apr 21 '15 at 14:11

If you are dealing with a multi-index dataframe, you may be interested in extracting only the column of one name of the multi-index. You can do this as

df.index.get_level_values('name_sub_index')

and of course name_sub_index must be an element of the FrozenList df.index.names

score 16 · Answer 5 · edited Mar 14 '19 at 18:29

16

Since pandas v0.13 you can also use get_values:

df.index.get_values()

edited Mar 14 '19 at 18:29

cs95

379,657
97
704
746

answered Nov 08 '14 at 11:42

yemu

26,249
10
32
29

5

Is there a difference between this and .values? (I updated version info, since this function appears from the 0.13.0 docs.) – Andy Hayden Dec 12 '14 at 03:12
@Andy Hayden: Isn't one difference that .get_values is the official way to get only the current values while .values (e.g. on a multi-index) may return index values for which the rows or columns have been deleted? – Ezekiel Kruglick Oct 08 '15 at 22:07
@EzekielKruglick so it's always a copy? The linked to documentation is very light, I didn't think you get dupes like that (even if they're in the MI they won't be in the .values) would be great to see an example which demonstrates this! – Andy Hayden Oct 08 '15 at 22:16
@AndyHayden: I think I was reading your comment wrong. You're right, .values is good, .level gives outdated and get_values gives you the current values properly excluding dropped rows/cols. Original github issue: github.com/pydata/pandas/issues/3686 But I just checked and it looks like .values (of course!) gives up to date info just in a different form than I thought was what we were talking about – Ezekiel Kruglick Oct 08 '15 at 22:34
1

@AndyHayden No, there is no difference. `get_values` just calls `.values`. It is more characters to type. – cs95 Jan 23 '19 at 20:48

score 2 · Answer 6 · answered Apr 16 '20 at 00:44

A more recent way to do this is to use the .to_numpy() function.

If I have a dataframe with a column 'price', I can convert it as follows:

priceArray = df['price'].to_numpy()

You can also pass the data type, such as float or object, as an argument of the function

score 0 · Answer 7 · answered Jul 23 '18 at 13:30

0

I converted the pandas dataframe to list and then used the basic list.index(). Something like this:

dd = list(zone[0]) #Where zone[0] is some specific column of the table
idx = dd.index(filename[i])

You have you index value as idx.

answered Jul 23 '18 at 13:30

Sarvagya Gupta

137
1
2
8

_and then used the basic list.index()_ How is that related to the question of converting a Series to a list? – AMC May 01 '20 at 11:35

score -1 · Answer 8 · edited Sep 02 '22 at 23:34

-1

Below is a simple way to convert a dataframe column into a NumPy array.

df = pd.DataFrame(somedict)
ytrain = df['label']
ytrain_numpy = np.array([x for x in ytrain['label']])

ytrain_numpy is a NumPy array.

I tried with to.numpy(), but it gave me the below error:

TypeError: no supported conversion for types: (dtype('O'),)* while doing Binary Relevance classfication using Linear SVC.

to.numpy() was converting the dataFrame into a NumPy array, but the inner element's data type was a list because of which the above error was observed.

edited Sep 02 '22 at 23:34

Peter Mortensen

30,738
21
105
131

answered Jun 07 '19 at 08:53

Kumar Shubham

39
3

_I tried with to.numpy() but it gave me the below error: TypeError: no supported conversion for types: (dtype('O'),) while doing Binary Relevance classfication using Linear SVC. to.numpy() was converting the dataFrame into numpy array but the inner element's data type was list because of which the above error was observed._ That's not really the fault of `to_numpy`, though. – AMC May 01 '20 at 11:36

How do I convert a Pandas series or index to a NumPy array?

8 Answers8

pandas >= 0.24

Deprecate your usage of `.values` in favour of these methods!

Linked

Related

How do I convert a Pandas series or index to a NumPy array?

8 Answers8

pandas >= 0.24

Deprecate your usage of .values in favour of these methods!

Linked

Related

Deprecate your usage of `.values` in favour of these methods!