0

I have the following data:

data sample

I want to create a gannt chart that would represent a timeline in python. I looked up another post that had a similar problem but the code didn't work out for me (How to get gantt plot using matplotlib) and I can't solve the issue on my own. It seems like it has something to do with the data type of my "time" values. Here is the code itself:

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('zpp00141_new.csv')
df.dropna(subset=['Latest finish / time', 'Earl. start / time'])
#error when I try to change data type of the columns to int
df["Latest finish / time"]= df["Latest finish / time"].astype(int) 
df["Earl. start / time"]= df["Earl. start / time"].astype(int)
#error below with data types
df["Diff"] = df['Latest finish / time'] - df['Earl. start / time']
color = {"In":"turquoise", "Out":"crimson"}
fig,ax=plt.subplots(figsize=(6,3))

labels=[]
for i, task in enumerate(df.groupby("Operation/Activity")):
    labels.append(task[0])
    for r in task[1].groupby("Operation short text"):
        data = r[1][["Earl. start / time", "Diff"]]
        ax.broken_barh(data.values, (i-0.4,0.8), color=color[r[0]] )

ax.set_yticks(range(len(labels)))
ax.set_yticklabels(labels) 
ax.set_xlabel("time [ms]")
plt.tight_layout()       
plt.show()

I tried to convert data type from object to "int" for the columns but it prompted another error: "invalid literal for int() with base 10: '9:22:00 AM'". I would really appreciate any assistance in this matter as I am quite new to programming in python. If there is a simpler and better way to represent what I need, it would be helpful if you could provide any tips. Basically, I need a gannt chart to represent each activity on the "timeline" from 7 am to 4:30 pm and reflect "now" time as a vertical line over the chart to indicate where we are now.

Max
  • 91
  • 1
  • 2
  • 9
  • Maybe [this example](https://stackoverflow.com/questions/59654501/gantt-chart-from-dictionary-with-lists-of-discrete-non-contiguous-dates-as-value/59665560#59665560) can help? – JohanC Feb 12 '20 at 16:28
  • Thanks for providing this example but my times are in a different format: h:min:sec AM/PM. And I need the graph to show the timeline from 7 am to 4:30 pm with hourly intervals. As a data input, I use csv file with start time and end time categorized by certain operation/activity. If you don't mind telling, how would I modify the code for these requirements? – Max Feb 12 '20 at 17:55

1 Answers1

2

When the time strings are not in a standard format, datetime.strptime can be used to convert them. strptime needs everything to be zero padded, so the code below checks whether the string starts with 1 or 2 digits and prepends a zero if needed.

Here is an example to get you started. I didn't grasp the code in the question, as some columns seem to be missing. Also, I changed the names of the columns to be compatible with variable names to be able to use row.start instead of row[1].

Colors can be assigned to each operation, just be creating a list of them. Matoplotlib has some built-in colormaps that can be used. For example, 'tab10' has 10 different colors. The list can be repeated if there aren't enough colors for each individual opereration.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from datetime import datetime
import math

def timestr_to_num(timestr):
    return mdates.date2num(datetime.strptime('0' + timestr if timestr[1] == ':' else timestr, '%I:%M:%S %p'))

df = pd.DataFrame({'start': ['7:00:00 AM', '1:00:00 PM', '7:20:00 AM', '2:00:00 PM'],
                   'finish': ['12:15:00 PM', '4:20:00 PM', '1:10:00 PM', '3:30:00 PM'],
                   'operation': ['operation 1', 'operation 1', 'operation 2', 'operation 3'],
                   'short_text': ['short text 1', 'short text 2', 'short text 1', 'short text 2']})
fig, ax = plt.subplots(figsize=(10, 3))
operations = pd.unique(df['operation'])
colors = plt.cm.tab10.colors  # get a list of 10 colors
colors *= math.ceil(len(operations) / (len(colors)))  # repeat the list as many times as needed
for operation, color in zip(operations, colors):
    for row in df[df['operation'] == operation].itertuples():
        left = timestr_to_num(row.start)
        right = timestr_to_num(row.finish)
        ax.barh(operation, left=left, width=right - left, height=0.8, color=color)
ax.set_xlim(timestr_to_num('07:00:00 AM'), timestr_to_num('4:30:00 PM'))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))  # display ticks as hours and minutes
ax.xaxis.set_major_locator(mdates.HourLocator(interval=1))  # set a tick every hour
plt.tight_layout()
plt.show()

example plot

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • Thanks for the code but what if I don't have the "In/out" column in my data. I only have the times. How do I assign the colors in that case? I would like to see each operation have a different color but I don't want to specify the color for each operation manually, I would prefer the color to be randomly picked or be the same across all the operations. What would be different in a code in that case? – Max Feb 14 '20 at 15:51
  • Thank you very much, I was able to plot a gantt chart with your help! – Max Feb 17 '20 at 14:50
  • Could you by any chance advice on how to plot a vertical red line that would correspond with the current time on x-axis? I tried using plt.axvline but I can't figure out how to convert it to make it work with my formatting and data. Thanks a lot in advance! – Max Feb 17 '20 at 19:21