I have a Pandas Dataframe that looks similar to this
|Ind| C1 | C2 |....| Cn |
|-----------------------|
| 1 |val1| AE |....|time|
|-----------------------|
| 2 |val2| FB |....|time|
|-----------------------|
|...|....| .. |....| ...|
|-----------------------|
| n |valn| QK |....|time|
and I have to group it by column C2
do some filtering on each group and store the results in a separate file for each group.
Grouped Dataframe:
Subset 1:
|Ind| C1 | C2 |....| Cn |
|-----------------------|
| 1 |val1| AE |....|time|
|-----------------------|
| 2 |val2| AE |....|time|
|-----------------------|
|...|....| .. |....| ...|
|-----------------------|
| n |valn| AE |....|time|
Subset 2
|Ind| C1 | C2 |....| Cn |
|-----------------------|
| 1 |val1| FB |....|time|
|-----------------------|
| 2 |val2| FB |....|time|
|-----------------------|
|...|....| .. |....| ...|
|-----------------------|
| n |valn| FB |....|time|
and so on.
My current approach looks similar to this
def my_filter_function(self, df):
result = df[df["C1"].notna() & df["Cn"] != 'Some value']
pd.to_csv(...)
df = pd.read_csv(...)
df.groupby("C2").apply(lambda x: self.my_filter_function(x))
My problem now is that Pandas calls the apply method twice on the first group as mentioned here, here and in the docs. So the file for the first group would be stored twice. Is there any way to avoid this or do you have any suggestion for another approach? Is it possible to keep the grouping after the apply method?
Regards