0

Is there a way to achieve this in Pandas?

Sample of my dataset:

date_time   version_A   spend_A version_B   spend_B
2022-07-30  User1   39734.582   User1   15354.253
2022-07-30  User2   11720.6742  User2   3486.8551
2022-07-30  User3   49015.5171  User3   18384.4266
2022-07-30  User4   23715.6717  User4   7944.3769
2022-07-30  User5   84249.8283  User5   47160.7129
... ... ... ... ...
2022-08-12  User1   20033.313   User1   8494.045
2022-08-12  User2   6045.3053   User2   2647.7135
2022-08-12  User3   30325.7474  User3   14720.9046
2022-08-12  User4   12838.4424  User4   12957.7918
2022-08-12  User5   111739.7095 User5   39797.8841

70 rows, 4 columns

On the x-axis I want the date_time, and for each x I have about two groups of 5 users each and I want to represent them as stacked bar chart.

I dropped column version_B then I tried grouping the data then plotting using this:

df.groupby(['version_a', pd.Grouper(freq='D')])['spend_a','spend_b'].sum().plot(kind='bar')

Which gives me a chart which is not decipherable. enter image description here

Essentially, I want those two columns to be side by side but they should be stacked in way that the bar is made up of the values on version_a with different colors.

enter image description here

Adding the option 'stacked' in plot() just stacks spend_a and spend_b which is not desired.

Any ideas?

EDIT: I found a partial solution for my problem: How to create a yearly bar plot grouped by months

df= pd.pivot_table(data=df, index=df.date_time_p1, 
               columns='version_B', values=['spend_A', 'spend_B'])

df.plot(kind='bar', figsize=(12, 4))

    spend_A                                          spend_B
version_A   User1   User2   User3   User4   User5   User1   User2   User3   User4   User5
date_time                                       
2022-07-30  119203.746  35162.0226  147046.5513 71147.0151  252749.4849 46062.759   10460.5653  55153.2798  23833.1307  141482.1387
2022-07-31  148194.771  46108.4352  195859.2123 136879.6758 433808.2329 54256.947   14023.0875  64238.7360  28703.0427  170297.5719
2022-08-01  154753.638  53772.0705  210497.8440 106826.5884 576303.9486 69089.559   17719.4886  67811.4348  26345.3781  220058.0895
2022-08-02  186336.963  67005.9321  383711.2353 91320.3990  551705.8338 168215.307  26589.2934  104063.0493 27809.8809  311763.5661
2022-08-03  28730.260   16887.6110  46366.1278  13737.8429  115621.7331 11571.692   5597.1872   15497.5675  6375.6251   83798.0708

But I still can't figure out a way to stack spend_A and spend_B as two separate columns. Using plot(stacked='True') groups everything into a single column.

Geosphere
  • 315
  • 4
  • 15

1 Answers1

0

Ok, so far many attempts and some research I managed to find a solution for the problem I'm dealing with.

Combining two stacked bar plots for a grouped stacked bar plot

What I did was to manipulate a bit the data spend in such a way that I would not have it formed as above so I ended up having two separate dataframes then I made a pivot table for each.

pivot_p1 = version_a.pivot_table(index='date_time', columns='version_A', 
                                   values='version_A')

I named each df as pivot.

fig, ax = plt.subplots(figsize=(12,6))


pivot_p1.plot(kind="bar", stacked=True, width=0.4, 
                  ax=ax, position=0)
pivot_p2.plot(kind="bar", stacked=True, width=0.4, 
                   ax=ax, position=1, legend=False);

enter image description here

Geosphere
  • 315
  • 4
  • 15