0

I have a CSV acting as a log file/db. It is structured like so:

User1,Client3,Ops/Admin,19:33:46,19:33:57,178.054662,Notes,05/09/2018
User1,Client2,Ops/Admin,20:33:46,20:33:57,449.272576,Notes,05/09/2018
User1,Client3,Support Management,21:33:46,21:33:57,1073.425258,Notes,05/09/2018
User1,Client2,Support Management,22:33:46,22:33:57,290.640219,Notes,05/09/2018
User1,Client3,Ops/Admin,23:33:46,23:33:57,402.223927,Notes,05/09/2018
User1,Client1,Support Management,00:33:46,00:33:57,761.266062,Notes,05/09/2018
User1,Client1,Business Developement,01:33:46,01:33:57,1623.303656,Notes,05/09/2018

I want to make a line chart using matplotlib.pyplot which sums the total of col[5] for each customer, for each date. IE a different line on the same chart, where the x axis is date (col[7]) and the y axis is the sum of the value for that date (col[5]).

I am using pandas, this is my starting point:

import matplotlib.pyplot as plt
import pandas as pd
import datetime, csv

csv_file = pd.read_csv("file.csv",
                           names = ['USER',
                                   'CLIENT',
                                   'TYPE',
                                   'START',
                                   'END',
                                   'DURATION',
                                   'NOTES',
                                   'DATE'])

I have then tried either making the table as a pivot table:

date_pivot = csv_file.pivot_table('DURATION', index='CLIENT', columns='DATE')

or by using the .groupby() function:

dategroup = csv_file.groupby(['CLIENT','DATE'], as_index = False).sum()

Both of which seem to generate a table which looks as though it should be sufficient to use.

The problem is, for no lack of trying(...), I cannot figure out how to plot the summed values of CLIENT by DATE on a matplotlib.pyplot.plot()... Can you help?

My only thoughts I have so far:

  • Should I be, somehow, rearranging the tables so that clients run down the Y-axis (/index) and dates along the x-axis (/header) instead?
  • Will it require some form of def iteration function using .loc()?

Any help on this would be greatly appreciated so thanks in advance!

pseudo-c
  • 23
  • 3

2 Answers2

0

This splits the results of summing on client and date into groups based on client again, and plots a separate line for each client:

import matplotlib.pyplot as plt
import pandas as pd

df['DATE'] = pd.to_datetime(df['DATE'])
aggregated = df.groupby(['CLIENT', 'DATE']).sum().reset_index()

fig, ax = plt.subplots()

for key, group in aggregated.groupby(['CLIENT']):
    ax = group.plot(ax=ax, kind='line', x='DATE', y='DURATION', label=key)

plt.show()

This is based on the group plotting trick from this answer.

Andrey Portnoy
  • 1,430
  • 15
  • 24
  • This worked perfectly thanks. Though I added label=key as that was displaying wrong. Thank you very much! – pseudo-c Sep 08 '18 at 22:21
0
import numpy as np
csv_file=pd.read_csv('file.csv',names =   ['USER',
                               'CLIENT',
                               'TYPE',
                               'START',
                               'END',
                               'DURATION',
                               'NOTES',
                               'DATE'],parse_dates=['DATE'])
csv_file.pivot_table('DURATION',index='DATE',columns='CLIENT',aggfunc=np.sum)
  • x-axis/index/DATE
  • Multiple y-axis/columns/CLIENT
  • Aggregation function = sum
TheMaster
  • 45,448
  • 6
  • 62
  • 85