1

I have the following dataframe:

Part_Number    Serial_Number    Timestamp    Feature    Machine    Tool    Rework
PN1    100    9/9/2020 8:26    FEAT_FN_H30    H10    93    1
PN1    101    9/9/2020 9:05    FEAT1_FN_H12    G3    85    2
PN1    102    9/9/2020 9:29    FEAT_FN_H23    H4    81    1
PN1    103    9/9/2020 10:53    FEAT1_FN_H15    H7    24    3
PN1    104    9/9/2020 10:53    FEAT1_FN_H17    H7    24    3
PN1    105    9/9/2020 10:53    FEAT1_FN_H19        24    1
PN1    106    9/9/2020 12:03    FEAT1_LF_LOC_BE    F7    6    2
PN1    107    9/9/2020 12:03    FEAT1_LF_LOC_FN            12
PN1    108    9/9/2020 12:33    FEAT_FN_H42        93    2
PN1    109    9/9/2020 12:33    FEAT_FN_H43    H10    83    1
PN1    110    9/9/2020 12:33    FEAT_FN_H44    H10    83    1
PN1    111    9/9/2020 12:33    FEAT_FN_H45    H10    83    1
PN1    112    9/9/2020 12:33    FEAT_FN_H46    H10    83    1

What I need is a bar chart with the sum of reworks performed for each feature.

I then need that bar broken down by what machine performed those reworks on that feature. I attempted to upload a photo of what I am looking for but kept receiving an error.

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
Maxwell
  • 149
  • 1
  • 12
  • Does this answer your question? [stacked bar plot using matplotlib](https://stackoverflow.com/questions/44309507/stacked-bar-plot-using-matplotlib) – Karthik Sep 10 '20 at 16:01
  • Does this answer your question? [Pandas - Plotting a stacked Bar Chart](https://stackoverflow.com/questions/23415500/pandas-plotting-a-stacked-bar-chart) – RichieV Sep 10 '20 at 16:53

1 Answers1

0
  • pandas.DataFrame.groupby the desired features and aggregate the sum
  • .unstack the grouped DataFrame from a long, to a wide format.
  • Remove level 0 of the multi-level column names created when unstacking
  • Plot the stacked bar, with pandas.DataFrame.plot
    • pandas.DataFrame.plot.bar
    • If you have a lot of 'Feature', use .barh for a horizontal bar, adjust figsize as needed, and change plt.xlabel('Rework') to ylabel.
import pandas as pd
import matplotlib.pyplot as plt

# read data
df = pd.read_csv('test.csv')

# display(df)
   Part_Number  Serial_Number       Timestamp          Feature Machine  Tool  Rework
0          PN1            100   9/9/2020 8:26      FEAT_FN_H30     H10  93.0       1
1          PN1            101   9/9/2020 9:05     FEAT1_FN_H12      G3  85.0       2
2          PN1            101   9/9/2020 9:05     FEAT1_FN_H12      G3  85.0       3
3          PN1            101   9/9/2020 9:05     FEAT1_FN_H12      G4  85.0       2
4          PN1            102   9/9/2020 9:29      FEAT_FN_H23      H4  81.0       1
5          PN1            103  9/9/2020 10:53     FEAT1_FN_H15      H7  24.0       3
6          PN1            104  9/9/2020 10:53     FEAT1_FN_H17      H7  24.0       3
7          PN1            105  9/9/2020 10:53     FEAT1_FN_H19     NaN  24.0       1
8          PN1            106  9/9/2020 12:03  FEAT1_LF_LOC_BE      F7   6.0       2
9          PN1            107  9/9/2020 12:03  FEAT1_LF_LOC_FN     NaN   NaN      12
10         PN1            108  9/9/2020 12:33      FEAT_FN_H42     NaN  93.0       2
11         PN1            109  9/9/2020 12:33      FEAT_FN_H43     H10  83.0       1
12         PN1            110  9/9/2020 12:33      FEAT_FN_H44     H10  83.0       1
13         PN1            111  9/9/2020 12:33      FEAT_FN_H45     H10  83.0       1
14         PN1            112  9/9/2020 12:33      FEAT_FN_H46     H10  83.0       1

# groupby and sum
dfg = df.groupby(['Feature', 'Machine']).agg({'Rework': sum}).unstack()

# fix column names
dfg.columns = dfg.columns.droplevel(0)

# sort the index by the sum of the rows
dfg = dfg.reindex(dfg.sum(axis=1).sort_values(ascending=False).index)

# plot the top 5
dfg.head().plot.bar(stacked=True, figsize=(8, 5), grid=True)
plt.ylabel('Rework')
plt.legend(title=dfg.columns.name, bbox_to_anchor=(1.05, 1), loc='upper left')

enter image description here

Test data

  • Extra data was added for testing, because there where no examples with a 'Feature' being produced by more than one 'Machine'.
Part_Number,Serial_Number,Timestamp,Feature,Machine,Tool,Rework
PN1,100,9/9/2020 8:26,FEAT_FN_H30,H10,93,1
PN1,101,9/9/2020 9:05,FEAT1_FN_H12,G3,85,2
PN1,101,9/9/2020 9:05,FEAT1_FN_H12,G3,85,3
PN1,101,9/9/2020 9:05,FEAT1_FN_H12,G4,85,2
PN1,102,9/9/2020 9:29,FEAT_FN_H23,H4,81,1
PN1,103,9/9/2020 10:53,FEAT1_FN_H15,H7,24,3
PN1,104,9/9/2020 10:53,FEAT1_FN_H17,H7,24,3
PN1,105,9/9/2020 10:53,FEAT1_FN_H19,,24,1
PN1,106,9/9/2020 12:03,FEAT1_LF_LOC_BE,F7,6,2
PN1,107,9/9/2020 12:03,FEAT1_LF_LOC_FN,,,12
PN1,108,9/9/2020 12:33,FEAT_FN_H42,,93,2
PN1,109,9/9/2020 12:33,FEAT_FN_H43,H10,83,1
PN1,110,9/9/2020 12:33,FEAT_FN_H44,H10,83,1
PN1,111,9/9/2020 12:33,FEAT_FN_H45,H10,83,1
PN1,112,9/9/2020 12:33,FEAT_FN_H46,H10,83,1
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158