18

I have a pandas dataframe (df) with the column structure :

month a b c d

this dataframe has data for say Jan, Feb, Mar, Apr. A,B,C,D are numeric columns. For the month of Feb , I want to recalculate column A and update it in the dataframe i.e. for month = Feb, A = B + C + D

Code I used :

 df[df['month']=='Feb']['A']=df[df['month']=='Feb']['B'] + df[df['month']=='Feb']['C'] + df[df['month']=='Feb']['D'] 

This ran without errors but did not change the values in column A for the month Feb. In the console, it gave a message that :

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

I tried to use .loc but right now the dataframe I am working on, I had used .reset_index() on it and I am not sure how to set index and use .loc. I followed documentation but not clear. Could you please help me out here? This is an example dataframe :

 import pandas as pd import numpy as np
 dates = pd.date_range('1/1/2000', periods=8)
 df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D']) 

I want to update say one date : 2000-01-03. I am unable to give the snippet of my data as it is real time data.

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Data Enthusiast
  • 521
  • 4
  • 12
  • 22
  • could you attach a little example of your dataframe? – Anton Protopopov Dec 28 '15 at 19:40
  • @AntonProtopopov : The dataframe I am working on is big, I tried to explain the logic here . I will see if I can create any dataframe – Data Enthusiast Dec 28 '15 at 19:42
  • 1
    you could attach like part of your dataframe with `df.head()` or `df.iloc[:10, :10]` – Anton Protopopov Dec 28 '15 at 19:44
  • Why not just `df['a'] = df.b + df.c + df.d`? You need to include sample data to clarify what you are trying to do and produce a MVE. [ask] – Alexander Dec 28 '15 at 19:59
  • Anton and Alexander : This is an example dataframe : import pandas as pd import numpy as np dates = pd.date_range('1/1/2000', periods=8) df = pd.DataFrame(np.random.randn(8, 4), index=dates, columns=['A', 'B', 'C', 'D']) I want to update say one date : 2000-01-03. I am unable to give the snippet of my data as it is real time data. – Data Enthusiast Dec 28 '15 at 20:10
  • @UdayShankar for future, it's better to update your question with your data not in the comment – Anton Protopopov Dec 28 '15 at 21:32

2 Answers2

27

As you could see from the warning you should use loc[row_index, col_index]. When you subsetting your data you get index values. You just need to pass for row_index and then with comma col_name:

df.loc[df['month'] == 'Feb', 'A'] = df.loc[df['month'] == 'Feb', 'B'] + df.loc[df['month'] == 'Feb', 'C'] + df.loc[df['month'] == 'Feb', 'D'] 
Anton Protopopov
  • 30,354
  • 12
  • 88
  • 93
1

While not being the most beautiful, the way I would achieve your goal (without explicitly iterating over the rows) is:

df.ix[df['month'] == 'Feb', 'a'] = df[df['month'] == 'Feb']['b'] + df[df['month'] == 'Feb']['c']  

Note: ix has been deprecated since Pandas v0.20.0 in favour of iloc / loc.

jpp
  • 159,742
  • 34
  • 281
  • 339
DeepSpace
  • 78,697
  • 11
  • 109
  • 154