0

I have a CSV file Master File which contains details of all the products in our repository alongwith their respective timing details (Time_In & Time_Out). I have read the CSV file into dataframe using pandas.

Can someone please tell me how can I plot the details of all the products (Product_A1,...,Product_A7) against their respective Time_In & Time_out details in a single plot using python.

What I need is something like this.

Reqd. Image

master_df.csv

Date_A1,Product_A1,TIme_In_A1,TIme_out_A1,Date_A2,Product_A2,TIme_In_A2,TIme_out_A2,Date_A3,Product_A3,TIme_In_A3,TIme_out_A3,Date_A4,Product_A4,TIme_In_A4,TIme_out_A4,Date_A5,Product_A5,TIme_In_A5,TIme_out_A5,Date_A6,Product_A6,TIme_In_A6,TIme_out_A6,Date_A7,Product_A7,TIme_In_A7,TIme_out_A7
07-06-2023,A1,00:00:00,00:45:00,07-06-2023,A2,00:00:00,01:03:00,07-06-2023,A3,00:00:00,00:30:00,07-06-2023,A4,00:00:00,00:15:00,07-06-2023,A5,00:00:00,00:33:00,07-06-2023,A6,00:00:00,00:48:00,07-06-2023,A7,04:00:00,05:03:00
07-06-2023,A1,00:48:00,01:00:00,07-06-2023,A2,01:06:00,01:18:00,07-06-2023,A3,00:33:00,03:33:00,07-06-2023,A4,00:18:00,02:15:00,07-06-2023,A5,00:36:00,02:18:00,07-06-2023,A6,00:51:00,01:06:00,07-06-2023,A7,05:06:00,06:21:00
07-06-2023,A1,01:03:00,01:15:00,07-06-2023,A2,01:21:00,02:45:00,07-06-2023,A3,03:36:00,04:30:00,07-06-2023,A4,02:18:00,03:18:00,07-06-2023,A5,02:21:00,03:36:00,07-06-2023,A6,01:09:00,01:21:00,07-06-2023,A7,06:24:00,07:03:00
07-06-2023,A1,01:18:00,01:30:00,07-06-2023,A2,02:48:00,03:00:00,07-06-2023,A3,04:33:00,05:30:00,07-06-2023,A4,03:21:00,04:18:00,07-06-2023,A5,03:39:00,05:48:00,07-06-2023,A6,01:24:00,01:33:00,07-06-2023,A7,07:06:00,09:09:00
07-06-2023,A1,01:33:00,01:45:00,07-06-2023,A2,03:03:00,03:15:00,07-06-2023,A3,05:33:00,06:15:00,07-06-2023,A4,04:21:00,04:33:00,07-06-2023,A5,05:51:00,06:03:00,07-06-2023,A6,01:36:00,02:48:00,07-06-2023,A7,09:12:00,10:00:00
07-06-2023,A1,01:48:00,03:30:00,07-06-2023,A2,03:18:00,03:48:00,07-06-2023,A3,06:18:00,07:18:00,07-06-2023,A4,04:36:00,04:45:00,07-06-2023,A5,06:06:00,06:48:00,07-06-2023,A6,02:51:00,03:03:00,07-06-2023,A7,10:03:00,12:30:00
07-06-2023,A1,03:33:00,03:45:00,07-06-2023,A2,03:51:00,04:00:00,07-06-2023,A3,07:21:00,08:45:00,07-06-2023,A4,04:48:00,05:33:00,07-06-2023,A5,06:51:00,07:00:00,07-06-2023,A6,03:06:00,03:21:00,07-06-2023,A7,12:33:00,12:48:00
07-06-2023,A1,03:48:00,04:00:00,07-06-2023,A2,04:03:00,04:15:00,07-06-2023,A3,08:48:00,09:03:00,07-06-2023,A4,05:36:00,05:45:00,07-06-2023,A5,07:03:00,07:30:00,07-06-2023,A6,03:24:00,05:00:00,07-06-2023,A7,12:51:00,14:00:00
07-06-2023,A1,08:00:00,09:00:00,07-06-2023,A2,04:18:00,05:15:00,07-06-2023,A3,09:06:00,11:45:00,07-06-2023,A4,05:48:00,06:18:00,07-06-2023,A5,07:33:00,08:51:00,07-06-2023,A6,05:03:00,07:21:00,07-06-2023,A7,14:03:00,14:48:00
07-06-2023,A1,09:03:00,10:15:00,07-06-2023,A2,05:18:00,06:00:00,07-06-2023,A3,11:48:00,12:00:00,07-06-2023,A4,06:21:00,06:45:00,07-06-2023,A5,08:54:00,09:06:00,07-06-2023,A6,07:24:00,07:33:00,07-06-2023,A7,14:51:00,16:18:00
07-06-2023,A1,10:18:00,10:45:00,07-06-2023,A2,06:03:00,06:30:00,07-06-2023,A3,16:00:00,16:15:00,07-06-2023,A4,06:48:00,08:48:00,07-06-2023,A5,09:09:00,09:30:00,07-06-2023,A6,07:36:00,08:30:00,07-06-2023,A7,16:21:00,17:03:00
07-06-2023,A1,10:48:00,11:15:00,07-06-2023,A2,06:33:00,07:15:00,07-06-2023,A3,16:18:00,16:30:00,07-06-2023,A4,08:51:00,10:18:00,07-06-2023,A5,09:33:00,11:33:00,07-06-2023,A6,08:33:00,10:21:00,07-06-2023,A7,17:06:00,19:15:00
07-06-2023,A1,11:18:00,11:30:00,07-06-2023,A2,07:18:00,08:00:00,07-06-2023,A3,16:33:00,17:45:00,07-06-2023,A4,10:21:00,10:30:00,07-06-2023,A5,11:36:00,12:03:00,07-06-2023,A6,10:24:00,13:21:00,07-06-2023,A7,19:18:00,19:33:00
07-06-2023,A1,11:33:00,14:15:00,07-06-2023,A2,12:00:00,13:00:00,07-06-2023,A3,17:48:00,18:45:00,07-06-2023,A4,10:33:00,11:18:00,07-06-2023,A5,12:06:00,12:15:00,07-06-2023,A6,13:24:00,14:39:00,07-06-2023,A7,19:36:00,21:48:00
07-06-2023,A1,14:18:00,14:30:00,07-06-2023,A2,13:03:00,13:30:00,07-06-2023,A3,18:48:00,19:00:00,07-06-2023,A4,11:21:00,11:48:00,07-06-2023,A5,12:18:00,12:45:00,07-06-2023,A6,14:42:00,14:45:00,07-06-2023,A7,21:51:00,22:03:00
07-06-2023,A1,14:33:00,15:15:00,07-06-2023,A2,13:33:00,13:45:00,07-06-2023,A3,19:03:00,22:15:00,07-06-2023,A4,11:51:00,12:00:00,07-06-2023,A5,12:48:00,13:18:00,07-06-2023,A6,14:48:00,16:00:00,07-06-2023,A7,22:06:00,22:33:00
07-06-2023,A1,15:18:00,16:45:00,07-06-2023,A2,13:48:00,16:48:00,07-06-2023,A3,22:30:00,01:00:00,07-06-2023,A4,12:03:00,13:15:00,07-06-2023,A5,13:21:00,13:36:00,07-06-2023,A6,16:03:00,17:00:00,07-06-2023,A7,22:36:00,23:03:00
07-06-2023,A1,16:48:00,17:30:00,07-06-2023,A2,16:51:00,19:45:00,,,,,07-06-2023,A4,13:18:00,13:33:00,07-06-2023,A5,13:39:00,14:36:00,07-06-2023,A6,17:03:00,17:15:00,07-06-2023,A7,23:06:00,23:15:00
07-06-2023,A1,17:33:00,18:30:00,07-06-2023,A2,19:48:00,20:00:00,,,,,07-06-2023,A4,13:36:00,14:33:00,07-06-2023,A5,14:39:00,16:51:00,07-06-2023,A6,17:18:00,21:33:00,07-06-2023,A7,23:18:00,01:57:00
07-06-2023,A1,18:33:00,20:30:00,07-06-2023,A2,20:03:00,20:15:00,,,,,07-06-2023,A4,14:36:00,15:00:00,07-06-2023,A5,16:54:00,17:33:00,07-06-2023,A6,21:36:00,23:33:00,,,,
07-06-2023,A1,20:33:00,21:30:00,07-06-2023,A2,20:18:00,20:33:00,,,,,07-06-2023,A4,15:03:00,16:00:00,07-06-2023,A5,17:36:00,19:30:00,07-06-2023,A6,23:36:00,00:00:00,,,,
07-06-2023,A1,21:33:00,21:45:00,07-06-2023,A2,20:36:00,20:45:00,,,,,07-06-2023,A4,20:00:00,20:48:00,07-06-2023,A5,19:33:00,19:48:00,,,,,,,,
07-06-2023,A1,21:48:00,22:15:00,07-06-2023,A2,20:48:00,21:15:00,,,,,07-06-2023,A4,20:51:00,22:00:00,07-06-2023,A5,19:51:00,20:00:00,,,,,,,,
07-06-2023,A1,22:18:00,22:30:00,07-06-2023,A2,21:18:00,22:18:00,,,,,07-06-2023,A4,22:03:00,00:48:00,,,,,,,,,,,,
07-06-2023,A1,22:33:00,23:45:00,07-06-2023,A2,22:21:00,22:45:00,,,,,,,,,,,,,,,,,,,,
07-06-2023,A1,23:48:00,00:15:00,07-06-2023,A2,22:48:00,23:00:00,,,,,,,,,,,,,,,,,,,,
,,,,07-06-2023,A2,23:03:00,23:30:00,,,,,,,,,,,,,,,,,,,,
,,,,07-06-2023,A2,23:33:00,23:48:00,,,,,,,,,,,,,,,,,,,,
,,,,07-06-2023,A2,23:51:00,01:48:00,,,,,,,,,,,,,,,,,,,,
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
RRSC
  • 257
  • 2
  • 15

2 Answers2

2
  • This option plots a long dataframe with .barh, as shown in this answer.
  • Where 'TIme_out' is detected to be on the following day, based on the time component being smaller than for TIme_In', 1 day is added to the date, so the visualization ends at the correct to for each 'Product'.
  • Tested in python 3.11.3, pandas 2.0.2, matplotlib 3.7.1, numpy 1.24.3
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np

# load the data
df = pd.read_csv('d:/data/2023-06-07_data_so.csv')

# convert the columns to datetime
for date, time_in, time_out in zip(df.columns[0::4], df.columns[2::4], df.columns[3::4]):
    df[time_in] = pd.to_datetime(df[date] + ' ' + df[time_in])
    df[time_out] = pd.to_datetime(df[date] + ' ' + df[time_out])    
    
    # if time out is less than time in, add 1 day to time out
    df[time_out] = np.where(df[time_out].lt(df[time_in]), df[time_out].add(pd.Timedelta(1, unit='D')), df[time_out])

# add an id column for use in wide_to_long
df['id'] = df.index

# convert the dataframe to long format
dfl = pd.wide_to_long(df.filter(regex='id|TIme'), stubnames=['TIme_In', 'TIme_out'], j='Product', sep='_', i='id', suffix='.+').reset_index(level=1).dropna()

# calculate the time difference
dfl['Width'] = dfl.TIme_out.sub(dfl.TIme_In)

# map Product to a color
cm = dict(zip(dfl.Product.unique(), ['#ff0000', '#92d050', '#00b0f0', '#ffff00', '#c55a11', '#7030a0', '#00b050']))
dfl['color'] = dfl.Product.map(cm)

# create fig and ax
fig, ax = plt.subplots(figsize=(15, 7), dpi=200, tight_layout=True)

# plot 
ax.barh(y='Product', width='Width', left='TIme_In', color='color', data=dfl, ec='k')

# set the x-axis limits
ax.set_xlim(dfl.TIme_In.min(), dfl.TIme_out.max())

# format the xticks and the interval
ax.xaxis.set_major_locator(mdates.MinuteLocator(byminute=range(0, 60, 30)))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))
fig.autofmt_xdate(rotation=90, ha="center")

enter image description here

dfl.head()

   Product             TIme_In            TIme_out           Width    color
id                                                                         
0       A1 2023-07-06 00:00:00 2023-07-06 00:45:00 0 days 00:45:00  #ff0000
1       A1 2023-07-06 00:48:00 2023-07-06 01:00:00 0 days 00:12:00  #ff0000
2       A1 2023-07-06 01:03:00 2023-07-06 01:15:00 0 days 00:12:00  #ff0000
3       A1 2023-07-06 01:18:00 2023-07-06 01:30:00 0 days 00:12:00  #ff0000
4       A1 2023-07-06 01:33:00 2023-07-06 01:45:00 0 days 00:12:00  #ff0000
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
1

Here is a proposition with broken_barh but you need to reshape your dataset first :

import pandas as pd

tmp = pd.read_csv("master_df.csv")

df = (
    tmp.filter(regex="Date|TIme")
        .pipe(lambda x: x.set_axis(
            x.columns.str.rsplit("_", n=1, expand=True), axis=1))
        .stack().droplevel(0).reset_index(names="Product")
        .assign(DateTime_In = lambda x:
                pd.to_datetime(x["Date"].str.cat(x.pop("TIme_In"), sep=" ")),
                DateTime_Out= lambda x:
                pd.to_datetime(x.pop("Date").str.cat(x.pop("TIme_out"), sep=" ")))
)

# to fix certain out dates
df["DateTime_Out"] = (
    df["DateTime_Out"].add(pd.Timedelta(days=1))
        .where(
            (df["DateTime_Out"].sub(df["DateTime_In"])).dt.days.eq(-1),
            df["DateTime_Out"]
        )
)

Then you can make your GANTT chart :

import matplotlib.pyplot as plt
import matplotlib.dates as mdates

fig, ax = plt.subplots()

groups, products = df.groupby("Product"), df["Product"].unique()

colors = {
    "A1": "#ff0000", "A2": "#92d050", "A3": "#00b0f0",
    "A4": "#ffff00", "A5": "#c55a11", "A6": "#7030a0",
    "A7": "#00b050"
}

height = 0.5

for y, (product, group) in enumerate(groups):
    durations = group["DateTime_Out"] - group["DateTime_In"]
    ax.broken_barh(list(zip(group["DateTime_In"], durations)), (y - height/2, height),
                   facecolors=colors.get(product),
                   edgecolor="blue", linewidth=0.5)

ax.set_ylim(-height, len(products) - 1 + height)
ax.set_xlim(df["DateTime_In"].min(), df["DateTime_Out"].max().ceil("H"))
ax.set_yticks(range(len(products)))
ax.set_yticklabels(sorted(products), fontsize=14)

ax.xaxis.set_major_locator(mdates.MinuteLocator(interval=30))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M:%S"))
fig.autofmt_xdate(rotation=90, ha="center")
fig.set_size_inches(15, 7)

ax.set_xlabel("Time", fontweight="bold", fontsize=14)
ax.set_ylabel("Product", fontweight="bold", fontsize=14)

ax.grid(axis="y", color="lightblue")

plt.show();

Output :

enter image description here

Timeless
  • 22,580
  • 4
  • 12
  • 30
  • Hi @Timeless. After running the first part of your code I got the following error: "TypeError: reset_index() got an unexpected keyword argument 'names'" – RRSC Jun 07 '23 at 14:59
  • That's because you're using a pandas version `<1.5.0`. the `names` parameter was added in [`1.5.0`](https://pandas.pydata.org/docs/whatsnew/v1.5.0.html#other-enhancements). Either update your pandas version or use `.reset_index().rename(columns={"level_1": "Product"})`. – Timeless Jun 07 '23 at 15:03
  • Hi @Timeless. Thanks for your kind help. Your solution is working fine. But issue is every time I am running the program timing values at x-axis ticks are getting changed. – RRSC Jun 07 '23 at 17:06
  • I think that's because you forgot to remove this chain/line `.sample(frac=0.3)`, can you check ? – Timeless Jun 07 '23 at 17:12
  • After removing ```.sample(frac=0.3)``` now the x-axis values are not changing. Another issue which I observed is that if you can see the csv file, for product A1 after Time_Out_A1=04:00:00, next Time_In_A1 value is 08:00:00. But after running the above program this gap is not visible even after keeping the x-axis interval as 30 mins. – RRSC Jun 07 '23 at 17:24
  • The plot is not matching with the time stamps available in the dataframe with their respective products. – RRSC Jun 07 '23 at 17:47
  • I think I see what you mean, that's because of the last row/cycle of each product. I'll try to give it a fix. – Timeless Jun 07 '23 at 17:51
  • I updated the answer, can you check it out ? – Timeless Jun 07 '23 at 17:57
  • Thanks @Timeless. Your solution is working. But i understood that you have removed the last row of each dataframe group. Without removing the last row is it possible to get the same answer. – RRSC Jun 07 '23 at 18:09
  • I'm not sure but I think it would be possible only if we fix the date of the last row when *--for some products--* we move from `PM` to `AM` (*i.e the next day*). For example, for the product `A1`, you have this in the last row : `7/6/2023|A1|11:48:00 PM|12:15:00 AM`, which is not 100% correct because the timing out is in `8/6/2023`. – Timeless Jun 07 '23 at 18:15
  • @RRSC I don't see how this is correct, since everything ends at 00:00, but the real data ends on the next day. – Trenton McKinney Jun 07 '23 at 18:26
  • @Trenton McKinney. The data is like that only. Some of our products enters the process where 00:00:00 crossovers are happening i.e. it enters the next day. – RRSC Jun 07 '23 at 18:30
  • @RRSC that's what I mean, the plot doesn't show that. see this [plot](https://i.stack.imgur.com/lUH0V.png) – Trenton McKinney Jun 07 '23 at 18:32
  • @TrentonMcKinney, this was pointed out in [this](https://stackoverflow.com/questions/76424058/how-to-plot-a-gantt-chart-from-multiple-dataframe-columns/76424588#comment134762722_76424588) comment. The last timing of some of the products was excluded on purpose. How do you think we should handle that ? Maybe by shifting the date of those rows/timings ? – Timeless Jun 07 '23 at 18:34
  • If time_out is less than time_in, then add timedelta of 1 day – Trenton McKinney Jun 07 '23 at 18:36
  • 2
    I made an update to the answer/code to handle that. – Timeless Jun 07 '23 at 18:52
  • @ Timeless. Thanks for your updated solution. Now it is working as expected. – RRSC Jun 07 '23 at 19:00