2

I have a Pandas DataFrame that looks like this:

     NAME      total           total_temp
ID                                      
1     CVS     [abc1]       [cba, xyzzy01]
2  Costco     [bcd2, 22]   [dcb, xyzzy02]
3   Apple     [cde3]       [edc, xyzzy03]

I want to add create a new column total_temp_2 so that the data looks like this:

     NAME      total       total_temp                   total_temp_2
ID                                                  
1     CVS     [abc1]       [cba, xyzzy01]       [abc1, cba, xyzzy01]
2  Costco     [bcd2, 22]   [dcb, xyzzy02]   [bcd2, 22, dcb, xyzzy02]
3   Apple     [cde3]       [edc, xyzzy03]       [cde3, edc, xyzzy03]

I feel like I could guess my way through really inefficient ways to concatenate the lists, but I suspect I'm missing something I don't know about Pandas.

How can i achieve this operation using pandas?

cs95
  • 379,657
  • 97
  • 704
  • 746
k..
  • 401
  • 3
  • 11
  • 2
    I can't see why your solution `df['total'] + df['total_temp']` isn't working. If those are lists it should combine them together – yatu Feb 11 '19 at 16:13
  • Should not have issue . even it is str , it should not have errors – BENY Feb 11 '19 at 16:19
  • All right ... I'm an idiot ... I forgot that for some testing reason or another, I had putzed around with my data, and that I was actually dealing with the string `'[dcb, xyzzy02]'`, not the list `[dcb, xyzzy02]`. Simple answer works now that both columns are full of proper lists. – k.. Feb 11 '19 at 16:26

2 Answers2

4

When dealing with mixed types, I usually recommend using something like a list comprehension which has minimal memory and performance overhead.

df['total_temp_2'] = [x + y for x, y in zip(df['total'], df['total_temp'])]
df

      NAME       total      total_temp              total_temp_2
ID                                                              
1      CVS      [abc1]  [cba, xyzzy01]      [abc1, cba, xyzzy01]
2   Costco  [bcd2, 22]  [dcb, xyzzy02]  [bcd2, 22, dcb, xyzzy02]
3    Apple      [cde3]  [edc, xyzzy03]      [cde3, edc, xyzzy03]

If these are columns of strings, you can use ast.literal_eval to parse them:

import ast

c = df.select_dtypes(include=[object]).columns
df[c] = df[c].applymap(ast.literal_eval)

If the solution above throws ValueError: malformed node or string:, try using the yaml package instead.

import yaml
df = df.applymap(yaml.load)

Funnily enough, simple addition works for me on 0.24.

df['total'] + df['total_temp']

ID
1        [abc1, cba, xyzzy01]
2    [bcd2, 22, dcb, xyzzy02]
3        [cde3, edc, xyzzy03]
dtype: object

These also work,

df['total'].add(df['total_temp'])

ID
1        [abc1, cba, xyzzy01]
2    [bcd2, 22, dcb, xyzzy02]
3        [cde3, edc, xyzzy03]
dtype: object

df['total_temp'].radd(df['total'])

ID
1        [abc1, cba, xyzzy01]
2    [bcd2, 22, dcb, xyzzy02]
3        [cde3, edc, xyzzy03]
dtype: object

These are great in terms of simplicity, but inherently loopy since mixed type operations are harder to vectorize.

cs95
  • 379,657
  • 97
  • 704
  • 746
  • All right ... I'm an idiot ... I forgot that for some testing reason or another, I had putzed around with my data, and that I was actually dealing with the string `'[dcb, xyzzy02]'`, not the list `[dcb, xyzzy02]`. Simple answer works now that both columns are full of proper lists. Thanks for the alternatives in case I ever get into a pinch, though! – k.. Feb 11 '19 at 16:27
  • @k.. I've edited your question so your mistake is less obvious now :-) – cs95 Feb 11 '19 at 16:28
  • 1
    @anky_91 Amended! – cs95 Feb 11 '19 at 16:38
1

In a situation like this (wanting to apply a function to a dataframe I usually go to .apply(). So I would run this:

df['total_temp_2'] = df.apply(lambda x: x['total'] + x['total_temp'], axis=1)

Using the built-in pandas functionality is optimal for this type of transformation.

Taipan
  • 116
  • 6