Pandas concatenate values of all column into a new column list

Question

I am trying to concat all my columns into a new column. The concatenated values should be stored in a list.

My dataframe:

df = pd.DataFrame({'A': ['1', '2', nan], 
                   'B': [nan, '5', nan], 
                   'C': ['7', nan, '9']})

desired output:

df:

A   B   C   concat_col
1   nan 7   [1,7]
2   5   nan [2,5]
nan nan 9   [9]

What i tried:

df['concat'] = pd.Series(df.fillna('').values.tolist()).str.join(',')

Output i got:

A   B   C   concat_col
1   nan 7   1,,7
2   5   nan 2,5,,
nan nan 9   ,,9

Joe Patten · Accepted Answer · 2018-09-25T22:25:50.307

3

The following code should work:

df['concat_col']=df.apply(lambda row: row.dropna().tolist(), axis=1)

edited Sep 25 '18 at 22:25

answered Sep 25 '18 at 22:13

Joe Patten

1,664
1
9
15

that worked! wondering why this doesnt work: `df['concat'] = pd.Series(df.dropna(axis=1).values.tolist()).str.join(',')` – jxn Sep 25 '18 at 22:17
1

because `df.dropna` will drop row or column containing nan, not only the value – ipramusinto Sep 25 '18 at 22:27
1

Yep, when you use dropna on a dataframe, it will drop a row or column contain nan. When you use dropna on a row (as what I did when I used the apply method on my df), it drops the nan's in every row. – Joe Patten Sep 25 '18 at 22:30

score 3 · Answer 2 · answered Sep 25 '18 at 22:52

3

You can use a list comprehension, taking advantage of the fact np.nan != np.nan:

df['D'] = [[i for i in row if i == i] for row in df.values]

print(df)

     A    B    C       D
0    1  NaN    7  [1, 7]
1    2    5  NaN  [2, 5]
2  NaN  NaN    9     [9]

Counter-intuitively, this is more efficient than Pandas methods:

df = pd.concat([df]*10000, ignore_index=True)

%timeit df.apply(lambda row: row.dropna().tolist(), axis=1)  # 8.25 s
%timeit [[i for i in row if i == i] for row in df.values]    # 55.6 ms

answered Sep 25 '18 at 22:52

jpp

159,742
34
281
339

Interesting, that is way faster. Thanks for sharing – jxn Sep 26 '18 at 00:51
anyway with out using list comprehension ? @jpp – Pyd Sep 26 '18 at 04:24
@pyd, Yes: `df.apply(lambda row: row.dropna().tolist(), axis=1)`. – jpp Jul 06 '20 at 08:31

Pandas concatenate values of all column into a new column list

2 Answers2

Linked