3

I have the following data frame my_df:

name         numbers
----------------------
A             [4,6]
B             [3,7,1,3]
C             [2,5]
D             [1,2,3]

I want to combine all numbers to a new list, so the output should be:

 new_numbers
---------------
[4,6,3,7,1,3,2,5,1,2,3]

And here is my code:

def combine_list(my_lists):
    new_list = []
    for x in my_lists:
        new_list.append(x)

    return new_list

new_df = my_df.agg({'numbers': combine_list})

but the new_df still looks the same as original:

              numbers
----------------------
0             [4,6]
1             [3,7,1,3]
2             [2,5]
3             [1,2,3]

What did I do wrong? How do I make new_df like:

 new_numbers
---------------
[4,6,3,7,1,3,2,5,1,2,3]

Thanks!

Edamame
  • 23,718
  • 73
  • 186
  • 320

4 Answers4

5

You need flatten values and then create new Dataframe by constructor:

flatten = [item for sublist in df['numbers'] for item in sublist]

Or:

flatten = np.concatenate(df['numbers'].values).tolist()

Or:

from  itertools import chain

flatten = list(chain.from_iterable(df['numbers'].values.tolist()))

df1 = pd.DataFrame({'numbers':[flatten]})

print (df1)
                             numbers
0  [4, 6, 3, 7, 1, 3, 2, 5, 1, 2, 3]

Timings are here.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • 1
    `functools.reduce(lambda x,y: x+y,l)` should be even faster – BENY Oct 23 '17 at 20:01
  • @Wen - I think it depends of sizeof lists, len of df, but [by here](https://stackoverflow.com/a/953097/2901002) the fastest solution is `chain.from_iterable`. – jezrael Oct 23 '17 at 20:07
  • The first answer is kinda ambiguous to me, is there simplified for statement for the first list comprehension answer? – Maeror Sep 23 '22 at 01:46
  • @Maeror - it should be simplify, removed `.values.tolist()` – jezrael Sep 23 '22 at 06:11
1

You can use df['numbers'].sum() which returns a combined list to create the new dataframe

new_df = pd.DataFrame({'new_numbers': [df['numbers'].sum()]})

    new_numbers
0   [4, 6, 3, 7, 1, 3, 2, 5, 1, 2, 3]
Vaishali
  • 37,545
  • 5
  • 58
  • 86
0

This should do:

newdf = pd.DataFrame({'numbers':[[x for i in mydf['numbers'] for x in i]]})

Puneet Tripathi
  • 412
  • 3
  • 15
0

Check this pandas groupby and join lists

What you are looking for is,

my_df = my_df.groupby(['name']).agg(sum)

  • Please don't post link only answers to other stackoverflow questions. Instead, vote/flag to close as duplicate, or, if the question is not a duplicate, *tailor the answer to this specific question.* – Waqar UlHaq Feb 21 '20 at 11:02