0

I have a dataframe like this which is an application log:

+---------+----------------+----------------+---------+----------+-------------------+------------+
|  User   | ReportingSubId | RecordLockTime | EndTime | Duration | DurationConverted | ActionType |
+---------+----------------+----------------+---------+----------+-------------------+------------+
| User 5  |             21 | 06:19.6        | 06:50.5 |       31 | 00:00:31          | Edit       |
| User 4  |             19 | 59:08.6        | 59:27.6 |       19 | 00:00:19          | Add        |
| User 25 |             22 | 29:09.4        | 29:37.0 |       28 | 00:00:28          | Edit       |
| User 10 |             19 | 28:36.9        | 33:37.0 |      300 | 00:05:00          | Add        |
| User 27 |             22 | 13:27.7        | 16:54.9 |      207 | 00:03:27          | Edit       |
| User 5  |             21 | 11:22.8        | 12:37.3 |       75 | 00:01:15          | Edit       |
+---------+----------------+----------------+---------+----------+-------------------+------------+

I wanted to visualize the duration of adds and edits for each user, ad Gantt Chart seemed ideal for me.

I was able to do it for a sample dataframe of 807 rows with the following code:

data = []

for row in df_temp.itertuples():
    data.append(dict(Task=str(row.User), Start=str(row.RecordLockTime), Finish=str(row.EndTime), Resource=str(row.ActionType)))

colors = {'Add': 'rgb(110, 244, 65)',
          'Edit': 'rgb(244, 75, 66)'}

fig = ff.create_gantt(data, colors=colors, index_col='Resource', show_colorbar=True, group_tasks=True)

for i in range(len(fig["data"]) - 2):
    text = "User: {}<br>Start: {}<br>Finish: {}<br>Duration: {}<br>Number of Adds: {}<br>Number of Edits: {}".format(df_temp["User"].loc[i], 
                                                                                                                                 df_temp["RecordLockTime"].loc[i], 
                                                                                                                                 df_temp["EndTime"].loc[i], 
                                                                                                                                 df_temp["DurationConverted"].loc[i], 

                                                                                                                                 counts[counts["User"] == df_temp["User"].loc[i]]["Add"].iloc[0],
                                                                                                                                 counts[counts["User"] == df_temp["User"].loc[i]]["Edit"].iloc[0])
    fig["data"][i].update(text=text, hoverinfo="text")

fig['layout'].update(autosize=True, margin=dict(l=150))
py.iplot(fig, filename='gantt-group-tasks-together', world_readable=True)

and I am more than happy with the result : https://plot.ly/~pawelty/90.embed

However my original df has more users and 2500 rows in total. That seems to be too much for plotly. I get 502 error.

I am a huge fan of plotly but I might have reached it's limit. Can I change something in order to visualize it with Plotly ? Any other tool I could use?

pawelty
  • 1,000
  • 8
  • 27
  • Have you tried running it on your computer rather than sending it to plotly for the visualization? – Jack Moody Jul 09 '18 at 16:45
  • Hmm I run it in Jupyter Lab and got the same result. – pawelty Jul 09 '18 at 16:59
  • What module are you using that is being called `ff`? – Jack Moody Jul 09 '18 at 17:06
  • I am following https://plot.ly/python/gantt/ the section about grouping tasks. The module is called plotly.figure_factory – pawelty Jul 09 '18 at 17:30
  • I can't find any limits on the functions within the [source code](https://github.com/plotly/plotly.py/blob/master/plotly/figure_factory/_gantt.py). Would you happen to know if you are using any part of the plotly API? If you are getting a 502 error, it seems like it might be on plotly's side. I'd check out matplotlib as an alternative. See [this answer](https://stackoverflow.com/questions/31820578/how-to-plot-stacked-event-duration-gantt-charts-using-python-pandas). – Jack Moody Jul 09 '18 at 17:48
  • Yeah but with Matplotlib I lose all the interactivity and sharing possibilities. Thanks for your help anyway. – pawelty Jul 09 '18 at 18:24
  • Can it be a limitation of a free account? – pawelty Jul 10 '18 at 08:21

1 Answers1

0

I started using plotly.offline.plot(fig) to plot offline and it worked much faster and I got less errors. I also have the problem that my graph doesn't get displayed or sometimes only in fullscreen mode...

I import plotly instead of plotly.plotly though, otherwise it doesn't work.

derhannes
  • 21
  • 7