I have a CSV acting as a log file/db. It is structured like so:
User1,Client3,Ops/Admin,19:33:46,19:33:57,178.054662,Notes,05/09/2018
User1,Client2,Ops/Admin,20:33:46,20:33:57,449.272576,Notes,05/09/2018
User1,Client3,Support Management,21:33:46,21:33:57,1073.425258,Notes,05/09/2018
User1,Client2,Support Management,22:33:46,22:33:57,290.640219,Notes,05/09/2018
User1,Client3,Ops/Admin,23:33:46,23:33:57,402.223927,Notes,05/09/2018
User1,Client1,Support Management,00:33:46,00:33:57,761.266062,Notes,05/09/2018
User1,Client1,Business Developement,01:33:46,01:33:57,1623.303656,Notes,05/09/2018
I want to make a line chart using matplotlib.pyplot which sums the total of col[5] for each customer, for each date. IE a different line on the same chart, where the x axis is date (col[7]) and the y axis is the sum of the value for that date (col[5]).
I am using pandas, this is my starting point:
import matplotlib.pyplot as plt
import pandas as pd
import datetime, csv
csv_file = pd.read_csv("file.csv",
names = ['USER',
'CLIENT',
'TYPE',
'START',
'END',
'DURATION',
'NOTES',
'DATE'])
I have then tried either making the table as a pivot table:
date_pivot = csv_file.pivot_table('DURATION', index='CLIENT', columns='DATE')
or by using the .groupby() function:
dategroup = csv_file.groupby(['CLIENT','DATE'], as_index = False).sum()
Both of which seem to generate a table which looks as though it should be sufficient to use.
The problem is, for no lack of trying(...), I cannot figure out how to plot the summed values of CLIENT by DATE on a matplotlib.pyplot.plot()... Can you help?
My only thoughts I have so far:
- Should I be, somehow, rearranging the tables so that clients run down the Y-axis (/index) and dates along the x-axis (/header) instead?
- Will it require some form of def iteration function using .loc()?
Any help on this would be greatly appreciated so thanks in advance!