2

How to convert a column to a non-nested list while the column elements are list?

For example, the column is like

column
[1, 2, 3]
[1, 2]

I want following at last.

[1,2,3,1,2]

But now with column.tolist(), I will get

[[1,2,3],[1,2]]

EDIT: Thanks for help. My intention is to find the most simple (elegant) and efficient method to do this. Now I use @jezrael method.

from itertools import chain
output = list(chain.from_iterable(df[column])

The simplest method is provided by @piRSquared, but maybe slower.

output = df[column].values.sum()
danche
  • 1,775
  • 15
  • 22

4 Answers4

7

You can use numpy.concatenate:

print (np.concatenate(df['column'].values).tolist())
[1, 2, 3, 1, 2]

Or:

from  itertools import chain
print (list(chain.from_iterable(df['column'])))
[1, 2, 3, 1, 2]

Another solution, thanks juanpa.arrivillaga:

print ([item for sublist in df['column'] for item in sublist])
[1, 2, 3, 1, 2]

Timings:

df = pd.DataFrame({'column':[[1,2,3], [1,2]]})
df = pd.concat([df]*10000).reset_index(drop=True)
print (df)

In [77]: %timeit (np.concatenate(df['column'].values).tolist())
10 loops, best of 3: 22.7 ms per loop

In [78]: %timeit (list(chain.from_iterable(df['column'])))
1000 loops, best of 3: 1.44 ms per loop

In [79]: %timeit ([item for sublist in df['column'] for item in sublist])
100 loops, best of 3: 2.31 ms per loop

In [80]: %timeit df.column.sum()
1 loop, best of 3: 1.34 s per loop
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
2

We concatenate lists with the + operator. Because a pandas series uses its' elements underlying + operation when you call pd.Series.sum, we can concatenate a whole column, or series, of lists with.

df.column.sum()

[1, 2, 3, 1, 2]

But if you're looking for performance, you can consider cytoolz.concat

import cytoolz

list(cytoolz.concat(df.column.values.tolist()))

[1, 2, 3, 1, 2]
piRSquared
  • 285,575
  • 57
  • 475
  • 624
0

You can use append method of list to do this:

col = {'col': [[1, 2, 3], [1, 2]]}
last = []
last.extend([i for c in col['col'] for i in c])
Pooja
  • 1,254
  • 14
  • 17
0

Another solution that will work is the list.extend() method.

list = [] for row in column: list.extend(row)

Jason Stein
  • 714
  • 3
  • 10