1

I have three Pandas columns where element are list. For combining these lists, I can do by explicitly write the name of column and + them together

df = pd.DataFrame({'allmz':([[1,2,3],[2,4,5],[2,5,5],[2,3,5],[1,4,5]]),'allint':([[11,31,31],[21,41,51],[41,51,51],[11,31,51],[1,51,11]]), 'allx':([[6,7,3],[2,4,5],[2,5,5],[2,9,5],[3,4,5]])})
df['new'] = df['allmz'] + df['allint'] + df['allint']
print df

      allint      allmz       allx                                new
0  [11, 31, 31]  [1, 2, 3]  [6, 7, 3]  [1, 2, 3, 11, 31, 31, 11, 31, 31]
1  [21, 41, 51]  [2, 4, 5]  [2, 4, 5]  [2, 4, 5, 21, 41, 51, 21, 41, 51]
2  [41, 51, 51]  [2, 5, 5]  [2, 5, 5]  [2, 5, 5, 41, 51, 51, 41, 51, 51]
3  [11, 31, 51]  [2, 3, 5]  [2, 9, 5]  [2, 3, 5, 11, 31, 51, 11, 31, 51]
4   [1, 51, 11]  [1, 4, 5]  [3, 4, 5]    [1, 4, 5, 1, 51, 11, 1, 51, 11]

However, if I have too many column names to write each of them, is there a way to do it by looping (or not looping) the list of column name: columns = ['allmz','allint','allx'] instead?

cs95
  • 379,657
  • 97
  • 704
  • 746
Jan
  • 1,389
  • 4
  • 17
  • 43

3 Answers3

3

Option 1
Slice on the columns and call sum along the first axis.

df['new'] = df[['allmz','allint','allx']].sum(axis=1)

df
         allint      allmz       allx                             new
0  [11, 31, 31]  [1, 2, 3]  [6, 7, 3]  [1, 2, 3, 11, 31, 31, 6, 7, 3]
1  [21, 41, 51]  [2, 4, 5]  [2, 4, 5]  [2, 4, 5, 21, 41, 51, 2, 4, 5]
2  [41, 51, 51]  [2, 5, 5]  [2, 5, 5]  [2, 5, 5, 41, 51, 51, 2, 5, 5]
3  [11, 31, 51]  [2, 3, 5]  [2, 9, 5]  [2, 3, 5, 11, 31, 51, 2, 9, 5]
4   [1, 51, 11]  [1, 4, 5]  [3, 4, 5]   [1, 4, 5, 1, 51, 11, 3, 4, 5]

Option 2
Another option with np.concatenate:

v = df[['allmz','allint','allx']].values.tolist()
df['new'] = np.concatenate(v, axis=0).reshape(len(df), -1).tolist()

df

         allint      allmz       allx                             new
0  [11, 31, 31]  [1, 2, 3]  [6, 7, 3]  [1, 2, 3, 11, 31, 31, 6, 7, 3]
1  [21, 41, 51]  [2, 4, 5]  [2, 4, 5]  [2, 4, 5, 21, 41, 51, 2, 4, 5]
2  [41, 51, 51]  [2, 5, 5]  [2, 5, 5]  [2, 5, 5, 41, 51, 51, 2, 5, 5]
3  [11, 31, 51]  [2, 3, 5]  [2, 9, 5]  [2, 3, 5, 11, 31, 51, 2, 9, 5]
4   [1, 51, 11]  [1, 4, 5]  [3, 4, 5]   [1, 4, 5, 1, 51, 11, 3, 4, 5]
cs95
  • 379,657
  • 97
  • 704
  • 746
  • Better than mine. – Alex Jan 21 '18 at 11:07
  • @Alex Thank you, but you needn't have deleted :-) – cs95 Jan 21 '18 at 11:08
  • Yes @Alex I just about to say I got error from your code `TypeError: can only concatenate list (not "unicode"). It makes me discover the difference between my toy and real data – Jan Jan 21 '18 at 11:10
  • 1
    Undeleted... @Jan are those lists really strings? Works fine for me. – Alex Jan 21 '18 at 11:11
  • @cᴏʟᴅsᴘᴇᴇᴅ Option 1 returns all zero. Option 2 works but it is included `NaN`. Can I just use `.dropna()` somewhere? – Jan Jan 21 '18 at 11:14
  • @Jan Yeah, but you can only use dropna after assigning the column. Something like `df = df.dropna(['new'])` – cs95 Jan 21 '18 at 11:15
  • One more thing, I just check the combined item on the real data. This is just 4 columns combined together. Each list doesn't merge. I mean it is still 4 lists in a list. Is there a way to make them all in a single list? The data looks like this: `[u"[(u'R', 175.11895000000001), (u'RA', 246.15606), (u'RAS', 333.18808999999999), (u'RASS', 420.22012000000001), (u'RASSA', 491.25722999999999), (u'RASSAL', 604.34129000000007), (u'RASSALV', 703.40970000000004)]", u"[(u'RASSALV', 352.20848999999998)]", u'[nan]', u'[nan]']` – Jan Jan 21 '18 at 11:22
  • @Jan Huh... those _aren't_ lists... they're string representation of lists. Do this first: `import ast; df[['allmz','allint','allx']] = df[['allmz','allint','allx']].applymap(ast.literal_eval)` – cs95 Jan 21 '18 at 11:35
  • Got `ValueError: ('malformed string', u'occurred at index ms2_Y3')` which is the column name where it has data as `nan` – Jan Jan 21 '18 at 11:40
  • 1
    @Jan Sorry, but this is beyond the scope of your original question, and this answer works for the data provided. Now, converting strings to lists is a different story, for which I'll give you this link: https://stackoverflow.com/questions/48008191/attributeerror-pandasexprvisitor-object-has-no-attribute-visit-ellipsis-us – cs95 Jan 21 '18 at 11:46
2

You can use Python's builtin sum function.

df['new'] = sum([df[col] for col in df], [])
Alex
  • 18,484
  • 8
  • 60
  • 80
1

If you are having a large set of column's name then an easy way to solve this problem is shown below :

col = df.loc[: , "allint":"allx"]

where "allint" is the start column name and "allx" is the end column name

df['new'] = col.sum(axis=1)
df

This will give you the same result you got after writing the name of each columns.

PraneetNigam
  • 959
  • 9
  • 16