48

I new to Python and I'm therefore having trouble converting a row in a DataFrame into a flat list. To do this I use the following code:

Toy DataFrame:

import pandas as pd
d = {
     "a": [1, 2, 3, 4, 5],
     "b": [9, 8, 7, 6, 5],
     "n": ["a", "b", "c", "d", "e"]
}

df = pd.DataFrame(d)

My code:

df_note = df.loc[df.n == "d", ["a", "b"]].values #convert to array
df_note = df_note.tolist() #convert to nested list
df_note = reduce(lambda x, y: x + y, df_note) #convert to flat list

To me this code appears to be both gross and inefficient. The fact that I convert to an array before a list is what is causing the problem, i.e. the list to be nested. That withstanding, I can not find a means of converting the row directly to a list. Any advice?

This question is not a dupe of this. In my case, I want the list to be flat.

Jacob H
  • 4,317
  • 2
  • 32
  • 39

3 Answers3

53

You are almost there, actually just use flatten instead of reduce to unnest the array (instead of unnesting the list), and chain operations to have a one liner:

df.loc[df.n == "d", ['a','b']].values.flatten().tolist()
#[4, 6]
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87
  • 4
    A general solution (less specific to the example) is: `df.loc[index, :].values.flatten().tolist()` where `index` is the index of the pandas Dataframe row you want to convert. – Mahsan Nourani Jun 25 '21 at 18:52
29

You get a nested list because you select a sub data frame.

This takes a row, which can be converted to a list without flattening:

df.loc[0, :].values.tolist()
[1, 9, 'a']

How about slicing the list:

df_note.values.tolist()[0]
[4, 6]

The values are stored in an NumPy array. So you do not convert them. Pandas uses a lot of NumPy under the hood. The attribute access df_note.values is just a different name for part of the data frame.

Mike Müller
  • 82,630
  • 20
  • 166
  • 161
1

I am assuming you're explicitly selecting columns a and b only to get rid of column n, which you are solely using to select the wanted row.

In that case, you could also use the n column as the index first, using set_index:

>>> dfi = df.set_index('n')
>>> dfi.ix['d'].tolist()
[4, 6]
Takis
  • 726
  • 5
  • 11
  • maybe OP has more columns and want to subset only `a` and `b`, in case the above does not work on a more generic dataframe. (but good approach still) – Colonel Beauvel Dec 12 '15 at 10:00
  • You're right, my answer was based on my interpretation of what the OP was trying to do, and thus less generic. I've edited my answer to clarify this. – Takis Dec 12 '15 at 10:12