0

I am trying to parse a csv file using a code like: (I have updated the code at the bottom. Please have a look)

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib import cm

cmap = cm.get_cmap('tab20')
pd.set_option('display.max_columns', 4)
#  pd.set_option('display.max_rows', None)
dataset = pd.read_csv("mwe.csv")
dsc = dataset.columns
print(dataset.value_counts(subset=[dsc[1]]).reset_index(name='count'))
print(dataset.value_counts(subset=[dsc[1], dsc[0]]).reset_index(name='count'))

and a sample mwe is:

Quartile,KeyArea
Q2,Earth
Q1,Earth
Q1,Fire
,Fire
Q3,Fire
Q1,Space
Q3,Space
Q1,Space
Q4,Space
Q2,Space
,Space
Q2,Air
Q1,Air
Q1,Air
Q1,Air
,Air
Q2,Water 
Q2,Water 
Q1,Water 

The output of the code with the data is:

#Total
 KeyArea  count
0     Air      5
1   Space      5
2    Fire      3
3  Water       3
4   Earth      2

#Keyarea split by Q1..Q4.
   KeyArea Quartile  count
0      Air       Q1      3
1    Space       Q1      2
2   Water        Q2      2
3      Air       Q2      1
4    Earth       Q1      1
5    Earth       Q2      1
6     Fire       Q1      1
7     Fire       Q3      1
8    Space       Q2      1
9    Space       Q3      1
10   Space       Q4      1
11  Water        Q1      1

Now, I am trying to get a stack bar plot with something like:

ax.bar(labels, q1, label='Q1')
ax.bar(labels, q2, bottom=q1, label='Q2')
ax.bar(labels, extra, bottom=q1 + q2, label='Others', color='C3')

i.e. for each keyarea, stack will be like keyarea in q1, keyarea in q2, total keyarea-keyarea in q1+q2.

How I can do that?

Update:

I have updated the code with:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from matplotlib import cm

cmap = cm.get_cmap('tab20')
pd.set_option('display.max_columns', 4)
#  pd.set_option('display.max_rows', None)
dataset = pd.read_csv("mwe.csv")
dsc = dataset.columns
dataset[dsc[1]].fillna("None", inplace=True)
print(dict(dataset.value_counts(subset=[dsc[0], dsc[1]]).sort_index()))

Which is giving the output:

{('Air', 'Q1'): 3, ('Air', 'Q2'): 1, ('Earth', 'Q1'): 1, ('Earth', 'Q2'): 1, ('Fire', ' Q1'): 1, ('Fire', 'Q3'): 1, ('Space', 'Q1'): 2, ('Space', 'Q2'): 1, ('Space', 'Q3'): 1, ('Space', 'Q4'): 1, ('Water', 'Q1'): 1, ('Water', 'Q2'): 2}

Now, I have to create a stack, say for Space. I need first stack for Q1, i.e. for ('Space', 'Q1'):2, second for (Space,Q2), and third with None+Q3+Q4.

Kindly help

NB:This is a restatement of the question create stackbar from tw different pandas output, as I have managed a mwe code and dataset. I will close the question.

BaRud
  • 3,055
  • 7
  • 41
  • 89
  • I don't get what you want to achieve. Do you simply want a stacked bar chart from the df with Quartile? – Henry Yik Aug 14 '21 at 10:52
  • @HenryYik: yes. But, the first output is the total number of Water,Space etc. In 2nd output, it is splitted in Q1,Q2,Q3,Q4 and may be empty. So, in the stack, I want stack of 3, with Water from Q1, water from Q2 and Total water -(Water from Q1 and Water from Q2). I can easily plot Water from Q1 and Q2. I dont know, since the total number is in another output, how to get the 3rd stack. – BaRud Aug 14 '21 at 11:14
  • If its a stacked bar, the bars would stack up to the total amount. why do you need to plot the total water seperately? – Henry Yik Aug 14 '21 at 11:17
  • @HenryYik: because, as I said, Stack is from Q1 , Q2, and total. Total is (if you have read the last comment, Q1+Q2+Q3+Q4+ nothing), which is not equal to Q1+Q2, as described in the 3rd sentence of my previous comment. – BaRud Aug 14 '21 at 13:17
  • So, for example, take stack of "Space" . Here, Q1=2, Q2=Q3=Q4=none=1. So, bottom stack should be of height 2 for Q1, middle is of height 1 for Q2, and for third stack is from total - (stack of Q1+Q2)=3. – BaRud Aug 14 '21 at 13:22
  • @HenryYik: I have updated the code. Please have a look, if your time permits – BaRud Aug 14 '21 at 16:25
  • Doesn't that means you simply want to add up Q3 and Q4 as an extra column? The answer posted already `pivot` the data, so just add Q3 and Q4 and plot it. – Henry Yik Aug 14 '21 at 16:36
  • @HenryYik: Please try to understand my problem. I dont have much experience in python data. Actually This is the first time I am seeing a dict key as a set of two value. So, forget about adding Q3+Q4+None, I dont know how to stack the (Fire,Q1) value itself. I am totally lost. – BaRud Aug 14 '21 at 16:56
  • Please have a look a the [updated answer](https://stackoverflow.com/a/68782775/16343464). I showed two examples of data. 1. raw quartiles plotted as stacked. 2. same with added "other" to sum up to the values in the first dataframe. – mozway Aug 14 '21 at 17:00

1 Answers1

1

Assuming your quartile dataset is named quartiles_df, you do not need the totals to plot a stacked bar, pandas can do it for you. You need to pivot your data first:

quartiles_df.pivot(*quartiles_df).plot.bar(stacked=True)

Below are a few graphical outputs:

pivoted_df = (quartiles_df.pivot(index='KeyArea',
                                 columns='Quartile',
                                 values='count')
              )
pivoted_df.plot.bar(stacked=True)

stacked bars

Complement to total (total_df):

(pivoted_df.assign(other=total_df.set_index('KeyArea')['count']-pivoted_df.sum(axis=1))
           .plot.bar(stacked=True)
)

complement to total

mozway
  • 194,879
  • 13
  • 39
  • 75
  • can you please also have a look at https://stackoverflow.com/questions/68789629/pandas-plot-split-long-xtickslabels, if your time permits? – BaRud Aug 15 '21 at 07:47
  • It looks like you already have a nice answer there ;) – mozway Aug 15 '21 at 08:08