I'm interested in finding the sum of values in a column creating a new column in the process on a subset of a dataframe meeting some condition. I'm not sure of how to work the sum of a new column from these two as I get an error when I try to access the New column created in the process:
import pandas as pd
d1={'X':[1,10,100,1000,1,10,100,1000,1,10,100,1000],
'Y':[0.2,0.5,0.4,1.2,0.1,0.25,0.2,0.6,0.05,0.125,0.1,0.3],
'RUN':[1,1,1,1,2,2,2,2,3,3,3,3]
}
df=pd.DataFrame(d1)
for RUNno in (df.RUN.unique()):
df1=df.RUN==RUNno #Selects the rows matching RUNno
df[df1]['NewColumn']=df[df1]['X']+df[df1]['Y'] #For the selected dataset, calculates the sum of two columns and creates a new column
print(df[df1].NewColumn) #Print the contents of the new column
I am unable to get df[df1].NewColumn contents as it is unable to identify the Key NewColumn. I'm pretty sure this way of creating new columns works on the standard dataframe df but not sure why it doesn't work on df[df1]. For eg.
df['NewColumn']=df['X']+df['Y']
df.NewColumn
Would work seamlessly.
To update the question, the columns data entries that are added to form the new column are from two different dataframes.
import pandas as pd
from scipy.interpolate import interp1d
interpolating_functions=dict()
d1={'X':[1,10,100,1000,1,10,100,1000,1,10,100,1000],
'Y':[0.2,0.5,0.4,1.2,0.1,0.25,0.2,0.6,0.05,0.125,0.1,0.3],
'RUN':[1,1,1,1,2,2,2,2,3,3,3,3] }
d2={'X':[1,10,100,1000,1,10,100,1000,1,10,100,1000],
'Y':[0.2,0.5,0.4,1.2,0.1,0.25,0.2,0.6,0.05,0.125,0.1,0.3],
'RUN':[1,1,1,1,2,2,2,2,3,3,3,3] }
df=pd.DataFrame(d1)
df2=pd.DataFrame(d2)
for RUNno in (df.RUN.unique()):
df1=df.RUN==RUNno
df3=df.RUN==RUNno
interpolating_functions[RUNno]=interp1d(df2[df3].X,df2[df3].Y)
df[df1]['NewColumn']=df[df1]['X']+interpolating_functions[RUNno](df2[df3]['X'])
print(df[df1].NewColumn)