1

This should be very easy, but I'm having several issues. The thing is, I want to do something like this post, but (1) I have a datetime field, so I have the hour, minutes and seconds in my date column, (2) I want to plot a line graph by day.

So, this is my data:

date                   col1        col2
2020-01-01 00:01:020   20          500
2020-01-02 00:01:020   10          500
2020-01-02 00:01:000   20          500
2020-01-02 00:01:021   20          500
2020-02-05 20:11:010   30          500
2020-02-05 10:01:020   10          500
.
.
.

So, as I mentioned above, what I want is to plot the daily average of col1. I started with this:

df.groupby('date')['col1'].mean()

That didn't work because of the hours, minutes and seconds. Later, I tried this:

df["day"] = df["date"].dt.day
df.groupby("day")["col1"].mean().plot(kind="line")

I almost did it, but the column day is not actually the day, but a number which represents the position of the day in the year, I guess. So any ideas on how to make this plot?

vestland
  • 55,229
  • 37
  • 187
  • 305
dummmmf
  • 63
  • 5

3 Answers3

3

IIUC:

groupby date instead of day:

df.groupby(df['date'].dt.date)["col1"].mean().plot(kind="line",rot=25)
#you don't need to create a column date for this directly pass date in groupby()

OR

df.groupby(df['date'].dt.normalize())["col1"].mean().plot(kind="line",rot=25)

Optional(you can also do this by these 2 but the above 2 fits best for your data and condition since the below ones will create unnecessary dates and NaN's):

#via pd.Grouper():
df.groupby(pd.Grouper(key='date',freq='1D'))["col1"].mean().dropna().plot(kind="line")
#OR
#via dt.floor():
df.groupby(df['date'].dt.floor('1D'))["col1"].mean().dropna().plot(kind="line")

output(for given sample data):

enter image description here

Anurag Dabas
  • 23,866
  • 9
  • 21
  • 41
  • 1
    Two wonderful answers, thank you so much!!!! Regarding the GROUPER answer, I'm getting this error: "TypeError: 'TimeGrouper' object is not subscriptable". Any idea about what's happening? – dummmmf Jul 11 '21 at 13:54
  • @dummmmf ohh sorry I forgot to add `)` so updated answer....kindly have a look...btw use 1st method as it will be faster in your case ***:)*** – Anurag Dabas Jul 11 '21 at 14:13
1

Since this question has seaborn and plotly tags as well,

sns.lineplot performs this operation without the need for groupby mean as the default estimator will compute the mean value per x instance. To remove error shading set ci=None.

Imports and setup:

import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

df = pd.DataFrame({
    'date': ['2020-01-01 00:01:020', '2020-01-02 00:01:020',
             '2020-01-02 00:01:000', '2020-01-02 00:01:021',
             '2020-02-05 20:11:010', '2020-02-05 10:01:020'],
    'col1': [20, 10, 20, 20, 30, 10],
    'col2': [500, 500, 500, 500, 500, 500]
})
df['date'] = pd.to_datetime(df['date'])

Plotting Code:

# Seaborn Line Plot x is the date, y is col1 default estimator is mean
ax = sns.lineplot(data=df, x=df['date'].dt.date, y='col1', ci=None)
ax.tick_params(axis='x', rotation=45)  # Make X ticks easier to read
plt.tight_layout()
plt.show()

plot 1 seaborn


For plotly take the groupby mean and create a px.line.

Imports and setup:

import pandas as pd
import plotly.express as px

df = pd.DataFrame({
    'date': ['2020-01-01 00:01:020', '2020-01-02 00:01:020',
             '2020-01-02 00:01:000', '2020-01-02 00:01:021',
             '2020-02-05 20:11:010', '2020-02-05 10:01:020'],
    'col1': [20, 10, 20, 20, 30, 10],
    'col2': [500, 500, 500, 500, 500, 500]
})
df['date'] = pd.to_datetime(df['date'])

Plotting code:

plot_values = df.groupby(df['date'].dt.date)["col1"].mean()
fig = px.line(plot_values)
fig.show()

plot 2

Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
0

What do you want exactly? the date without time? try this:

df["day"] = df["date"].apply(lambda l: l.date())