0

I have a data frame with few thousand rows. It looks something like below:

ID Amount Segment
1  23     A
2  43     B
3  65     A
4  23     A
5  86     C
6  54     B
7  432    B
8  987    A
9  43     C
10 46     C

At first I had to segregate data based on segment which I did:

df_A = df[(df['Segment'] == 'A')]
df_B = df[(df['Segment'] == 'B')]
df_C = df[(df['Segment'] == 'C')]

After doing so I had to perform some operations which included groupby and other functions. So I have to groupby in each of those subsets and perform operations for example as shown below:

df_A['days'] = (df_A['first'] - df_US['last']).dt.days
df_A_A = df_A[(df_A['days'] >= 0) & (df_A['days'] <= 30)]
A = df_A_A.groupby('days').user.nunique().reset_index()
A['user'] = A['user'].cumsum()

Now here I am creating two further data frames for each subset and finally plotting the dataframe A (B and C in other two subsets).

And in the end I had to plot for each set:

plt.plot(A['x'], A['y'], color='red', label='A')
plt.plot(B['x'], B['y'], color='blue', label='B')
plt.plot(C['x'], C['y'], color='green', label='C')

Now the problem is that I may have n number of segments and it would be easier to do all this operation inside one loop. Is that possible? I want to write the code for only one segment and then basically get the desired output for all the segments. I tried to group by segment in loop but not sure how to accommodate it so that it will also create df_A_A and A data frames in the process.

I am trying this and getting error:

for key, grp in df.groupby(['Segment']):




df_grp = df[(df['Segment'] == grp)]
df_grp['days'] = (df_grp['first'] - df_grp['last']).dt.days

df_grp_1 = df_grp[(df_grp['days'] >= 0) & (df_grp['days'] <= 30)]
grp = df_grp_1.groupby('days').user.nunique().reset_index()
grp['user'] = grp['user'].cumsum()


plt.plot(grp['days'], grp['conv'], color='key', label=key)
plt.legend()
plt.xlabel('days')
plt.ylabel('conv')
plt.show()

I am getting this error:

File "<ipython-input-6-7a4977f42a83>", line 10
    df_grp = df[(df['Segment'] == grp)]
         ^
IndentationError: expected an indented block

Thanks in advance!

Abhishek Singh
  • 187
  • 2
  • 13
  • you can `groupby` then `plot` check this https://stackoverflow.com/questions/41494942/pandas-dataframe-groupby-plot – Epsi95 Aug 24 '21 at 03:14
  • @Epsi95 But I have other operations as well not just plot. – Abhishek Singh Aug 24 '21 at 03:16
  • `groupby` is iterable as demonstrated [in the second answer](https://stackoverflow.com/a/56652699/15497888) `for key, grp in df.groupby('Segment'):` Then in the loop body would be something like: `plt.plot(grp['x'], grp['y'], label=key)` – Henry Ecker Aug 24 '21 at 03:17
  • @HenryEcker Yes I understood this but this does not solve my problem where I am doing some other operations. Maybe I will have to edit and include those pointers as well. – Abhishek Singh Aug 24 '21 at 03:27
  • Yes. The way your question currently reads a loop over groupby addresses the issue. The loop produces a dataframe for each unique value in Segment which is exactly the same as subsetting manually with an index selection. You can perform any and all dataframe operations needed on `grp` before plotting. – Henry Ecker Aug 24 '21 at 03:31
  • And what is the reason you cannot do those operations in the loop on `grp`? – Henry Ecker Aug 24 '21 at 16:38
  • Getting errors. I mean the grp contains A B and C. To create the subsets df_A, df_A_A, and A I used grp but getting different errors. I will post them. – Abhishek Singh Aug 24 '21 at 17:02
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/236365/discussion-between-abhishek-singh-and-henry-ecker). – Abhishek Singh Aug 24 '21 at 17:12

0 Answers0