0

I have a set of data that I want to plot in a graph. I have a list of timestamps which I want to group per hour and then I want to see the amount of points per hour in a line graph (over one day, where I have data of multiple days, which I want in a graph per day).

I have the value of the points per hour and I have the hours on which they occur. I do not get it to work that it gives a line in my graph and I think I am missing a simple solution. I have posted a picture as well to you can see the output. What is the following step to take to get the line to show?

I have the following code:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import csv
from datetime import timedelta
import datetime as dt
 
data= pd.read_csv('test2.csv', header=0, index_col=None, parse_dates=True, sep=';', usecols=[0,1])
df=pd.DataFrame(data, columns=['Date', 'Time'])
df['DateTime'] = df['Date'] + df['Time']

#for date in df['DateTime']:


def RemoveMilliSeconds(x):
    return x[:-5]

df['Time'] = df['Time'].apply(RemoveMilliSeconds)

df['DateTime'] = df['Date'] + df['Time']
df['DateTime'] = pd.to_datetime(df['DateTime'], format="%Y:%m:%d %H:%M:%S")
df['TimeDelta'] = df.groupby('Date')['DateTime'].apply(lambda x: x.diff())

#print(df['TimeDelta'] / np.timedelta64(1, 'h'))
df['HourOfDay'] = df['DateTime'].dt.hour
df['Day'] = df['DateTime'].dt.day

grouped_df = df.groupby('Day')

for key, item in grouped_df:
    print(grouped_df.get_group(key)['HourOfDay'].value_counts(), "\n\n")


res=[]
for i in df['DateTime'].dt.hour:
    if i not in res:
        res.append(i)
print("enkele lijst:" + str(res))
#range = (0,24)
#bins = 2
#plt.hist(df['DateTime'].dt.hour, bins, range)

x=np.array([res])

y=np.array([df['HourOfDay'].value_counts()])
plt.plot(x,y)
plt.show()

#times = pd.DatetimeIndex(df.Time)
#grouped = df.groupby([times.hour])

The picture that shows the output The picture that shows the output

My sample data:

Date;Time
2020:02:13 ;12:39:02:913 
2020:02:13 ;12:39:42:915 
2020:02:13 ;13:06:20:718 
2020:02:13 ;13:18:25:988 
2020:02:13 ;13:34:02:835 
2020:02:13 ;13:46:35:793 
2020:02:13 ;13:59:10:659 
2020:02:13 ;14:14:33:571 
2020:02:13 ;14:25:36:381 
2020:02:13 ;14:35:38:342 
2020:02:13 ;14:46:04:006 
2020:02:13 ;14:56:57:346 
2020:02:13 ;15:07:39:752 
2020:02:13 ;15:19:44:868 
2020:02:13 ;15:32:31:438 
2020:02:13 ;15:44:44:928 
2020:02:13 ;15:56:54:453 
2020:02:13 ;16:08:21:023 
2020:02:13 ;16:19:17:620 
2020:02:13 ;16:29:56:944 
2020:02:13 ;16:40:11:132 
2020:02:13 ;16:49:12:113 
2020:02:13 ;16:57:26:652 
2020:02:13 ;16:57:26:652 
2020:02:13 ;17:04:22:092 
2020:02:17 ;08:58:08:562 
2020:02:17 ;08:58:42:545 
Mr. T
  • 11,960
  • 10
  • 32
  • 54
peter
  • 21
  • 4
  • Any chance you could provide the `test2.csv` file or part of it, so that we can test your code more easily? – OctaveL Nov 20 '20 at 10:56
  • 1
    Simply copying the entire code here rarely attracts good answers. I suggest you read [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) - often in the process of refining the question, the solution becomes more obvious. – Mr. T Nov 20 '20 at 11:26

1 Answers1

0

You did not prepare your x-y data in a way that matplotlib can understand their relationship.

The easy "answer" would be to plot res and df['HourOfDay'].value_counts() directly against each other:

#.....
#range = (0,24)
#bins = 2
#plt.hist(df['DateTime'].dt.hour, bins, range)

plt.plot(res, df['HourOfDay'].value_counts())
plt.show()

But the sample output shows you the problem: enter image description here

matplotlib does not order the x-values for you (that would misrepresent the data in a different context). So, we have to do this before plotting:

#.....
#range = (0,24)
#bins = 2
#plt.hist(df['DateTime'].dt.hour, bins, range)

xy=np.stack((res, df['HourOfDay'].value_counts()))
xy = xy[:, np.argsort(xy[0,:])]
plt.plot(*xy)
plt.show()

Now, the x-values are in the correct order, and the y-values have been sorted with them in the combined xy array that we created for this purpose:

enter image description here

Obviously, it would be better to prepare res and df['HourOfDay'].value_counts() directly, so we don't have to create a combined array to sort them together. Since you did not provide an explanation what your code is supposed to do, we can only post-fix the problem the code created - you should structure it differently, so that this problem does not occur in the first place. But only you can do this (or people who understand the intention of your code - I don't).

I also suggest spending some time with the instructive matplotlib tutorials - this time is not wasted.

Update
It seems you try to create a subplot for each day and count the number of entries per hour. I would approach it like this (but I am sure, some panda experts have better ways for this):

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
 
#read your data and create datetime index
df= pd.read_csv('test1.txt', sep=";") 
df.index = pd.to_datetime(df["Date"]+df["Time"].str[:-5], format="%Y:%m:%d %H:%M:%S")

#group by date and hour, count entries
dfcounts = df.groupby([df.index.date, df.index.hour]).size().reset_index()
dfcounts.columns = ["Date", "Hour", "Count"]
maxcount = dfcounts.Count.max()

#group by date for plotting
dfplot = dfcounts.groupby(dfcounts.Date)

#plot each day into its own subplot
fig, axs = plt.subplots(dfplot.ngroups, figsize=(6,8))

for i, groupdate in enumerate(dfplot.groups):
    ax=axs[i]
    #the marker is not really necessary but has been added in case there is just one entry per day
    ax.plot(dfplot.get_group(groupdate).Hour, dfplot.get_group(groupdate).Count, color="blue", marker="o")
    ax.set_title(str(groupdate))
    ax.set_xlim(0, 24)
    ax.set_ylim(0, maxcount * 1.1)
    ax.xaxis.set_ticks(np.arange(0, 25, 2))

plt.tight_layout()
plt.show()

Sample output: ![enter image description here

Update 2
To plot them into individual figures, you can modify the loop:

#...
dfplot = dfcounts.groupby(dfcounts.Date)

for groupdate in dfplot.groups:
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 4))
    fig.suptitle("Date:"+str(groupdate), fontsize=16)

    #scaled for comparability among graphs
    ax1.plot(dfplot.get_group(groupdate).Hour, dfplot.get_group(groupdate).Count, color="blue", marker="o")
    ax1.set_xlim(0, 24)
    ax1.xaxis.set_ticks(np.arange(0, 25, 2))
    ax1.set_ylim(0, maxcount * 1.1)
    ax1.set_title("comparable version")

    #scaled to maximize visibility per day
    ax2.plot(dfplot.get_group(groupdate).Hour, dfplot.get_group(groupdate).Count, color="red", marker="x")
    ax2.set_xlim(0, 24)
    ax2.xaxis.set_ticks(np.arange(0, 25, 2))
    ax2.set_title("expanded version")
    
    plt.tight_layout()
    #save optionally 
    #plt.savefig("MyDataForDay"+str(groupdate)+".eps")

print("All figures generated")
plt.show()

Sample output for one of the days: enter image description here

created with the following test data:

Date;Time
2020:02:13 ;12:39:02:913 
2020:02:13 ;12:39:42:915 
2020:02:13 ;13:06:20:718 
2020:02:13 ;13:18:25:988 
2020:02:13 ;13:34:02:835 
2020:02:13 ;13:46:35:793 
2020:02:13 ;13:59:10:659 
2020:02:13 ;14:14:33:571 
2020:02:13 ;14:25:36:381 
2020:02:13 ;14:35:38:342 
2020:02:13 ;14:46:04:006 
2020:02:13 ;14:56:57:346 
2020:02:13 ;15:07:39:752 
2020:02:13 ;15:19:44:868 
2020:02:13 ;15:32:31:438 
2020:02:13 ;15:44:44:928 
2020:02:13 ;15:56:54:453 
2020:02:13 ;16:08:21:023 
2020:02:13 ;16:19:17:620 
2020:02:13 ;16:29:56:944 
2020:02:13 ;16:40:11:132 
2020:02:13 ;16:49:12:113 
2020:02:13 ;16:57:26:652 
2020:02:13 ;16:57:26:652 
2020:02:13 ;17:04:22:092 
2020:02:17 ;08:58:08:562 
2020:02:17 ;08:58:42:545 
2020:02:17 ;15:19:44:868 
2020:02:17 ;17:32:31:438 
2020:02:17 ;17:44:44:928 
2020:02:17 ;17:56:54:453 
2020:02:17 ;18:08:21:023 
2020:03:19 ;06:19:17:620 
2020:03:19 ;06:29:56:944 
2020:03:19 ;06:40:11:132 
2020:03:19 ;14:49:12:113 
2020:03:19 ;16:57:26:652 
2020:03:19 ;16:57:26:652 
2020:03:19 ;17:04:22:092 
2020:03:19 ;18:58:08:562 
2020:03:19 ;18:58:42:545 
Mr. T
  • 11,960
  • 10
  • 32
  • 54
  • Thanks! I have data of multiple days, where I need a graph per day. The test2 file was day one + a part of day two. This gave two graphs, one for day 1 , for day 2. Test2 file has a small part of the total file, which contains over 40 days with around 20k points. I need a plot for each day, that is why it was grouped per day. Your code helps a lot! The only thing I see is that the 8'o'clock is from the other day, but added in this graph. Can I use: for key, item in grouped_df: values = grouped_df.get_group(key)['HourOfDay'].value_counts() ? – peter Nov 20 '20 at 13:07
  • thank you for your help! The reason for this data is that we want to analyze data we gathered from a certain machine. The timestamp is doing something and we want to see how many times it happens and when. I used your script to analyze the data and it works very well, but I have one question: I have over 50 days and since this script puts it in a subplot, the plots are too small to see. Is there a way to make a plot per day, with the date as title? That would make it possible to read it from the screen. – peter Nov 23 '20 at 07:45
  • Thank you very much! You have saved me a lot of time @Mr. T! I followed your instructions about doing the tutorial, it was very helpful. Do you know where I can find more advanced problems to train with? – peter Nov 23 '20 at 12:41
  • I was also wondering if it is possible to put a simple threshold within the graphs. For example: all the lines of one month in plot, one line for each day, where all lines that don't exceed 20 will be green, all lines that do exceed 20 red, with a different pattern for each line. – peter Nov 23 '20 at 12:42
  • My recommendation here (and in every other aspect of life): Don't train for the sake of training, train with real problems. All I have learned about Python (not much) came from answering SO questions, learning from other people's answers here, and reading documentations. So, yes, your real-world problem of combining the plots is not very difficult - you can do that. If you have a specific question where you get stuck in the implementation, ask another question - people here are #SOreadytohelp. But remember: SO is not a free coding service. – Mr. T Nov 23 '20 at 13:09