2

I've been trying to get some posted similar solutions to work with no luck.

I am trying to get histograms for Cost for all the Step No in our manufacturing process. There are a different number of steps for each part, so I want to have a set of histograms on one plot/image for each part.

In my real data there are many parts so if this could loop through many parts and save the graphs that would be ideal.

Additionally we have a target cost for each step that I want to overlay on the histogram. This is represented in a separate dataframe. I got stuck on the loop for the subplots so I didn't try this yet.

Here's as close as to what I can find for what each step histogram should look like: Example of what I'm picturing

Here is my code so far:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_excel('Dist_Example.xlsx')
df1 = df[~df['Cost Type'].isin(['Material'])]
number_of_subplots = len(df1['Step No'].unique())
steps = df1['Step No'].unique()
fig, axs = plt.subplots(1, number_of_subplots, sharey = True, tight_layout=True)
for step in steps:
    df2 = df1[df1['Step No'].isin([step])]
    axs[step].hist(df2['Cost'])
plt.show()

Thank you in advance for helping me!

Here is the Target Cost I'd like to be shown as vertical line on the histogram:

PartNo  StepNo  TargetCost
ABC     10      12
ABC     20      20
ABC     30     13

Here's some sample historical data which should be in bins in the histogram:

PartNo  SerialNo    StepNo  CostType    Cost
ABC      123        10      Labor       11
ABC      123        10      Material    16
ABC      456        10      Labor       21
ABC      456        10      Material    26
ABC      789        10      Labor       21
ABC      789        10      Material    16
ABC      1011       10      Labor       11
ABC      1011       10      Material    6
ABC      1112       10      Labor       1
ABC      1112       10      Material    -4
ABC      123        20      Labor       11
ABC      123        20      Material    19
ABC      456        20      Labor       24
ABC      456        20      Material    29
ABC      789        20      Labor       24
ABC      789        20      Material    19
ABC      1011       20      Labor       14
ABC      1011       20      Material    9
ABC      1112       20      Labor       4
ABC      1112       20      Material    -1
ABC      123        30      Labor       11
ABC      123        30      Material    13
ABC      456        30      Labor       18
ABC      456        30      Material    23
ABC      789        30      Labor       18
ABC      789        30      Material    13
ABC      1011       30      Labor       8
ABC      1011       30      Material    3
ABC      1112       30      Labor       -2
ABC      1112       30      Material    -7

And a second sample dataset:

PartNo  SerialNo    StepNo  CostType    Cost
DEF     Aplha       10  Labor   2
DEF     Zed         10  Labor   3
DEF     Kelly       10  Labor   4
DEF     Aplha       20  Labor   3
DEF     Zed         20  Labor   2
DEF     Kelly       20  Labor   5
DEF     Aplha       30  Labor   6
DEF     Zed         30  Labor   7
DEF     Kelly       30  Labor   5
DEF     Aplha       40  Labor   3
DEF     Zed         40  Labor   4
DEF     Kelly       40  Labor   2
DEF     Aplha       50  Labor   8
DEF     Zed         50  Labor   9
DEF     Kelly       50  Labor   7
vestland
  • 55,229
  • 37
  • 187
  • 305
Programming_Learner_DK
  • 1,509
  • 4
  • 23
  • 49
  • I've taken the liberty to edit the column names so that the datasets con be more easily picked up using `pd.read_clipboard(sep='\\s+')` – vestland Aug 31 '18 at 11:23

1 Answers1

0

You won't find a histogram function that solves this directly for your dataset. You'll need to aggregate the data in a way that suits your needs, and then represent your findings with a bar plot.

I find your objective and data a bit confusing, but I think I've figured out what you're after given these assumptions:

  1. You want to aggregate the cost per StepNo
  2. Cost type is irrelevant
  3. Total Target cost will have to be calculated since you are aggregating all costs within each StepNo.

The plot

enter image description here

EDIT

This is not what OP was looking for. After a bit of back and forth we found a solution that seemed to work

(from the question) I am trying to get histograms for Cost for all the Step No

(from a comment) I actually want to have a historgram for the sum of the cost per serial no in each step.

Since you've got to have count or frequency on the y-axis in a histogram, you will have to aggregate the data in some way that makes sense. Below you'll see the count for a bin number of choice for aggregated costs of each SerialNO at each step.

Result:

enter image description here

Code:

import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.pyplot as plt
import pylab


# Load data in two steps:
# df1 = pd.read_clipboard(sep='\\s+')
# Part No Serial No   Step No Cost Type   Cost
# ABC      123        10      Labor       11
# ABC      123        10      Material    16
# ABC      456        10      Labor       21
# ABC      456        10      Material    26
# ...

# df2 = pd.read_clipboard(sep='\\s+')
# Part No Step No Target Cost
# ABC     10      12
# ABC     20      20
# ABC     30     13

# Cost type and SerialNo irrelevant
df11 = df1.drop(['CostType'] , axis = 1)

# Aggregate by StepNo, find total cost and count
##df12 = df11.groupby(['PartNo', 'StepNo']).agg(['sum', 'count']).reset_index()
df12 = df11.groupby(['PartNo', 'StepNo', 'SerialNo']).agg(['sum', 'count']).reset_index()

df12.columns = ['PartNo', 'StepNo', 'SerialNo', 'Cost', 'Count']
df3 = pd.merge(df2, df12, how = 'left', on = ['PartNo', 'StepNo'])

# Calculate total target cost
df3['TargetTotal'] = df3['TargetCost']*df3['Count']

# pylab.rcParams['figure.figsize'] = (2, 1)

def multiHist(x_data, x_label, bins):

    # Hisrogram setup
    fig, ax = plt.subplots()
    ax.hist(x_data, bins=bins, color='blue', alpha=0.5, histtype='stepfilled')

    # Horizontal line
    x0 = dfs['TargetTotal'].iloc[0]
    ax.axvline(x0, color='red', linewidth=2)

    # Annotation
    ax.annotate('Target: {:0.2f}'.format(x0), xy=(x0, 1), xytext=(-15, 15),
            xycoords=('data', 'axes fraction'), textcoords='offset points',
            horizontalalignment='left', verticalalignment='center',
            arrowprops=dict(arrowstyle='-|>', fc='white', shrinkA=0, shrinkB=0,
                            connectionstyle='angle,angleA=0,angleB=90,rad=10'),)

    # Labels
    ax.set_xlabel(x_label, color = 'grey')
    ax.legend(loc='upper left')
    plt.show()

# Identify and plot  data for each StepNo
for step in df3['StepNo'].unique():
    dfs = df3[df3['StepNo']==step]

    # Data to plot
    cost = dfs['Cost']
    labels = 'Part: ' + dfs['PartNo'].iloc[0] + ', ' 'Step:' + str(dfs['StepNo'].iloc[0])

    # Plot
    multiHist(x_data = cost, x_label = labels, bins = 4)    
vestland
  • 55,229
  • 37
  • 187
  • 305
  • Thank you for helping me and the edits. I actually want to have a historgram for the sum of the cost per serial no in each step. The step order should be maintained, and it would be one large plt for the entire part. So there would be a histogram subplot for step 10, step 20, step 30 sharing the axis and then a vertical horizontal line for the cost model cost. I apologize for not being clear enough in the original question. – Programming_Learner_DK Aug 31 '18 at 12:01
  • I'll gladly have a look at it again. A screenshot of some chart that resembles what you want would really help though – vestland Aug 31 '18 at 12:07
  • Cool! But I'm still a bit confused. I added a screenshot that shows cost for each SerialNo for StepNo = 10, for Part = ABC – vestland Aug 31 '18 at 14:30
  • Very close! The x-axis are the bin cost ranges (i.e. 10-20, 20-30, etc.) and the y-axis are the count of the serial nos in that bin (i.e. 3, 4, 2, etc.). I'd need to play with the bins in the real data to get something right. Make sense? – Programming_Learner_DK Aug 31 '18 at 14:34
  • And the the vertical line would be where the cost model cost fits in... forgot to mention that. – Programming_Learner_DK Aug 31 '18 at 14:35
  • I made an important change with regards to the target line. Until now it has shown average cost, and not the target. This is now corrected using `x0 = dfs['TargetTotal'].iloc[0]` – vestland Sep 06 '18 at 07:47