2

I have a pandas DataFrame that contains a value that was logged every few minutes.

import pandas as pd
import numpy as np
df = pd.DataFrame()
df['Time'] = pd.date_range("2018-01-01", periods=1000, freq="5Min")
df['Value'] = np.random.randint(1, 6, df.shape[0])

Now I want to make a boxplot showing the distribution per day. Normally, I would use resample or groupby, but I have the problem to feed the groups back into seaborn for the boxplot or to perform some other statistics.

Right now I use a very ugly form to return the groups back into a DataFrame and flip it to have the days as columns:

daily = df.groupby(pd.Grouper(key='Time', freq='1D'))
df_days = daily['Value'].apply(lambda df: df.reset_index(drop=True)).unstack().transpose()

df_days can than be fed into seaborn.boxplot to generate the whisker-plots.

Is there an easier way to get the DataFrame df_days?

Thanks

RaJa
  • 1,471
  • 13
  • 17

1 Answers1

0

Since your data is in the long form already, seaborn is the correct choice. You can use either dt.normalize() or dt.date to get the dates:

sns.boxplot(y=df['Value'], x=df['Time'].dt.date)

Output:

enter image description here

Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
  • That solves a the plot-question. Thanks. But do you know a trick to get the df_days dataframe as well? Sometimes I want to do some math with this data as well. – RaJa Mar 17 '21 at 13:47
  • Read the Q/A 10 *pivot with two columns* in [this guide](https://stackoverflow.com/questions/47152691/how-to-pivot-a-dataframe). – Quang Hoang Mar 17 '21 at 13:50
  • Nice guide. Q/A 10 is almost doing the trick. But I have to insert another column using `dt.date` again, otherwise I loose the daily grouping. Anyway, it solves my problem. – RaJa Mar 17 '21 at 14:14