I applogize in advance as this question is partly technical and partly understanding better pandas/python design.
Here's what I'm trying to do. Give this data:
a,b
1,2
1,3
2,4
2,5
I want to create a third column ['c'] with some data based on each column and grouping of A. The way I have done it is:
for item in df.a.unique():
dataSetToProcess = df.loc[df['a'] == item][['a', 'b']]
dataSetToProcess['c'] = dataSetToProcess.apply(MyFunction)
with this approach, I have a value for C but it's only on dataSetToProcess which is not part of the main DF. I would like to have the value of C available in the main DF.
my expected results(for simplicity of example, let's say each group of column A just averages the group itself and adds it to column C):
a,b,c
1,2, 2.5
1,3, 2.5
2,4, 4.5
2,5, 4.5
My two thoughts were to take each result and map column A,B to the original DF but was wondering if there was an easier/cleaner approach?