60

I have dataframe in Pandas for example:

Col1 Col2
A     1 
B     2
C     3

Now if I would like to add one more column named Col3 and the value is based on Col2. In formula, if Col2 > 1, then Col3 is 0, otherwise would be 1. So, in the example above. The output would be:

Col1 Col2 Col3
A    1    1
B    2    0
C    3    0

Any idea on how to achieve this?

Seanny123
  • 8,776
  • 13
  • 68
  • 124
Santiago Munez
  • 1,965
  • 5
  • 18
  • 16

2 Answers2

76

You just do an opposite comparison. if Col2 <= 1. This will return a boolean Series with False values for those greater than 1 and True values for the other. If you convert it to an int64 dtype, True becomes 1 and False become 0,

df['Col3'] = (df['Col2'] <= 1).astype(int)

If you want a more general solution, where you can assign any number to Col3 depending on the value of Col2 you should do something like:

df['Col3'] = df['Col2'].map(lambda x: 42 if x > 1 else 55)

Or:

df['Col3'] = 0
condition = df['Col2'] > 1
df.loc[condition, 'Col3'] = 42
df.loc[~condition, 'Col3'] = 55
Jonas Praem
  • 2,296
  • 5
  • 32
  • 53
Viktor Kerkez
  • 45,070
  • 12
  • 104
  • 85
  • Awesome. Thank you very much for your advice. I have tried this and it's working! – Santiago Munez Sep 22 '13 at 10:11
  • Can I use df['col4'] = df['col2', 'col1'].map(lambda x: 20 if x > 1 elif x > 10 x:40 else 100) – Payne Mar 03 '16 at 11:43
  • @Payne, no, this wouldn't work, only for exact one column – VMAtm Jun 09 '16 at 23:10
  • I have a problem with date not serializable in JSON output. I have several date range – Payne Jun 12 '16 at 16:34
  • Hi @VMAtm, how can I use multiple conditions to add a new column? For example, if I have first both columns with numeric values and I want to use conditions as `if col1 > 2 and col2 > 1`. So, for this scenario how can I utilize above `lambda` solution? Help me, please! – Abdul Rehman Mar 31 '18 at 03:53
  • @AbdulRehman If you have a new question, ask it, do not use comments for discussion – VMAtm Mar 31 '18 at 03:55
  • Hi @VMAtm, sure, i have posted a question, https://stackoverflow.com/q/49586471/7644562 Can you take a look, please! – Abdul Rehman Mar 31 '18 at 09:57
0

The easiest way that I found for adding a column to a DataFrame was to use the "add" function. Here's a snippet of code, also with the output to a CSV file. Note that including the "columns" argument allows you to set the name of the column (which happens to be the same as the name of the np.array that I used as the source of the data).

#  now to create a PANDAS data frame
df = pd.DataFrame(data = FF_maxRSSBasal, columns=['FF_maxRSSBasal'])
# from here on, we use the trick of creating a new dataframe and then "add"ing it
df2 = pd.DataFrame(data = FF_maxRSSPrism, columns=['FF_maxRSSPrism'])
df = df.add( df2, fill_value=0 )
df2 = pd.DataFrame(data = FF_maxRSSPyramidal, columns=['FF_maxRSSPyramidal'])
df = df.add( df2, fill_value=0 )
df2 = pd.DataFrame(data = deltaFF_strainE22, columns=['deltaFF_strainE22'])
df = df.add( df2, fill_value=0 )
df2 = pd.DataFrame(data = scaled, columns=['scaled'])
df = df.add( df2, fill_value=0 )
df2 = pd.DataFrame(data = deltaFF_orientation, columns=['deltaFF_orientation'])
df = df.add( df2, fill_value=0 )
#print(df)
df.to_csv('FF_data_frame.csv')