10

Suppose I have a dataframe:

    col1    col2    col3
0    1       5       2
1    7       13
2    9       1
3            7

How do I convert to a single list such as:

[1, 7, 9, 5, 13, 1, 7]

I have tried:

df.values.tolist()

However this returns a list of lists rather than a single list:

[[1.0, 5.0, 2.0], [7.0, 13.0, nan], [9.0, 1.0, nan], [nan, 7.0, nan]]

Note the dataframe will contain an unknown number of columns. The order of the values is not important so long as the list contains all values in the dataframe.

I imagine I could write a function to unpack the values, however I'm wondering if there is a simple built-in way of converting a dataframe to a series/list?

Alan
  • 509
  • 4
  • 15

3 Answers3

15

Following your current approach, you can flatten your array before converting it to a list. If you need to drop nan values, you can do that after flattening as well:

arr = df.to_numpy().flatten()
list(arr[~np.isnan(arr)])

Also, future versions of Pandas seem to prefer to_numpy over values


An alternate, perhaps cleaner, approach is to 'stack' your dataframe:

df.stack().tolist()
busybear
  • 10,194
  • 1
  • 25
  • 42
2

you can use dataframe stack

In [12]: df = pd.DataFrame({"col1":[np.nan,3,4,np.nan], "col2":['test',np.nan,45,3]})

In [13]: df.stack().tolist()
Out[13]: ['test', 3.0, 4.0, 45, 3]
Roushan
  • 4,074
  • 3
  • 21
  • 38
1

For Ordered list (As per problem statement):
Only if your data contains integer values:

Firstly get all items in data frame and then remove the nan from the list.

items = [item for sublist in [df[cols].tolist() for cols in df.columns] for item in sublist]
items = [int(x) for x in items if str(x) != 'nan']

For Un-Ordered list:
Only if your data contains integer values:

items = [int(x) for x in sum(df.values.tolist(),[]) if str(x) != 'nan']
Learner13
  • 53
  • 9