Here's an example in pandas
, I don't know what your histogram is supposed to look like, but I can get you up to the point of where it sounds like you'll be able to create it yourself.
First convert your data to a dictionary
so pandas
can load it:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Time': ['2020-01-01, 09:00:01', '2020-01-01, 09:01:01', '2020-01-01, 09:02:01', '2020-01-01, 09:03:01', '2020-01-01, 09:04:01', '2020-01-01, 09:05:01', '2020-01-01, 09:06:01', '2020-01-01, 09:07:01', '2020-01-01, 09:08:01', '2020-01-01, 09:09:01', '2020-01-01, 09:10:01', '2020-01-01, 09:11:01', '2020-01-01, 09:12:01', '2020-01-01, 09:13:01', '2020-01-01, 09:14:01'], 'Value' : [68, 68, 69, 68, 69, 70, 71, 75, 72, 73, 72, 72, 73, 72, 72]})
I only grabbed a few points of yours, but I am going to extrapolate it outwards so to demo how you can create a list of data frames grouped by the hour:
# adding 2 hr, 45 min so I have a total of 3 hour intervals
for i in range(1, 167):
new_row = {'Time' : df['Time'].loc[len(df)-1] + pd.Timedelta(minutes = 1), 'Value' : random.choice(df['Value'])}
df = df.append(new_row, ignore_index = True)
These are the functions you'll want to use once your data is converted to pandas
data frame:
def get_time_mask(start_time, df_time):
# this will be a range of 61 values (i.e. 9:00 to 10:00 including 10:00)
# if you don't want to include the end on hour, change minutes = 59
end_time = start_time + pd.Timedelta(minutes = 60)
return (df_time >= start_time) & (df_time <= end_time)
def group_dfs(df):
dfs_grouped = []
# if not including on hour, change to 60
while(len(df) > 61):
# select a chunk of the df from it's starting point out to 60 minutes
time_mask = get_time_mask(df.head(1)['Time'].values[0], df['Time'])
#append that chunk to the list
dfs_grouped.append(df[time_mask])
#set the data frame equal to everything we did not grab
df = df[np.invert(time_mask)]
#while loop repeats until the data frame has less values than hr int
#grab the remaining if > 0 append to list
if(len(df) != 0):
dfs_grouped.append(df)
return dfs_grouped
def apply_category(val):
categories = pd.Series(['low', 'medium', 'high', 'ultra', 'critical'])
conditions = [val <= 59, (val >= 60 and val <= 69), (val >= 70 and val <=79), (val >= 80 and val <=89), val >= 90]
return categories[conditions].values[0]
def categorize(df):
df['Category'] = df['Value'].apply(apply_category)
return df
def get_midpoints(df):
return df.groupby('Category').mean()['Value']
dfs_grouped = group_dfs(df)
#change in place
for df in dfs_grouped:
df['Category'] = df['Value'].apply(apply_category)
dfs_midpoint = [get_midpoints(df) for df in dfs_grouped]
Output:
dfs_grouped
:
# list of len = 3, each is a df
[ Time Value
0 2020-01-01 09:00:01 68
1 2020-01-01 09:01:01 68
2 2020-01-01 09:02:01 69
3 2020-01-01 09:03:01 68
4 2020-01-01 09:04:01 69
.. ... ...
56 2020-01-01 09:56:01 71
57 2020-01-01 09:57:01 72
58 2020-01-01 09:58:01 73
59 2020-01-01 09:59:01 72
60 2020-01-01 10:00:01 73
[61 rows x 2 columns], Time Value
61 2020-01-01 10:01:01 72
62 2020-01-01 10:02:01 73
63 2020-01-01 10:03:01 73
64 2020-01-01 10:04:01 70
65 2020-01-01 10:05:01 72
.. ... ...
117 2020-01-01 10:57:01 73
118 2020-01-01 10:58:01 72
119 2020-01-01 10:59:01 72
120 2020-01-01 11:00:01 72
121 2020-01-01 11:01:01 73
[61 rows x 2 columns], Time Value
122 2020-01-01 11:02:01 69
123 2020-01-01 11:03:01 72
124 2020-01-01 11:04:01 72
125 2020-01-01 11:05:01 73
126 2020-01-01 11:06:01 73
127 2020-01-01 11:07:01 72
128 2020-01-01 11:08:01 72
129 2020-01-01 11:09:01 70
130 2020-01-01 11:10:01 72
131 2020-01-01 11:11:01 73
132 2020-01-01 11:12:01 73
133 2020-01-01 11:13:01 72
134 2020-01-01 11:14:01 73
135 2020-01-01 11:15:01 73
136 2020-01-01 11:16:01 72
137 2020-01-01 11:17:01 72
138 2020-01-01 11:18:01 72
139 2020-01-01 11:19:01 70
140 2020-01-01 11:20:01 73
141 2020-01-01 11:21:01 73
142 2020-01-01 11:22:01 72
143 2020-01-01 11:23:01 70
144 2020-01-01 11:24:01 72
145 2020-01-01 11:25:01 72
146 2020-01-01 11:26:01 72
147 2020-01-01 11:27:01 73
148 2020-01-01 11:28:01 72
149 2020-01-01 11:29:01 73
150 2020-01-01 11:30:01 73
151 2020-01-01 11:31:01 68
152 2020-01-01 11:32:01 70
153 2020-01-01 11:33:01 72
154 2020-01-01 11:34:01 72
155 2020-01-01 11:35:01 73
156 2020-01-01 11:36:01 72
157 2020-01-01 11:37:01 72
158 2020-01-01 11:38:01 72
159 2020-01-01 11:39:01 69
160 2020-01-01 11:40:01 72
161 2020-01-01 11:41:01 72
162 2020-01-01 11:42:01 72
163 2020-01-01 11:43:01 73
164 2020-01-01 11:44:01 73
165 2020-01-01 11:45:01 73
166 2020-01-01 11:46:01 72
167 2020-01-01 11:47:01 72
168 2020-01-01 11:48:01 72
169 2020-01-01 11:49:01 73
170 2020-01-01 11:50:01 70
171 2020-01-01 11:51:01 72
172 2020-01-01 11:52:01 75
173 2020-01-01 11:53:01 70
174 2020-01-01 11:54:01 72
175 2020-01-01 11:55:01 72
176 2020-01-01 11:56:01 72
177 2020-01-01 11:57:01 69
178 2020-01-01 11:58:01 72
179 2020-01-01 11:59:01 73
180 2020-01-01 12:00:01 73]
after the loop dfs_grouped
:
[ Time Value Category
0 2020-01-01 09:00:01 68 medium
1 2020-01-01 09:01:01 68 medium
2 2020-01-01 09:02:01 69 medium
3 2020-01-01 09:03:01 68 medium
4 2020-01-01 09:04:01 69 medium
.. ... ... ...
56 2020-01-01 09:56:01 71 high
57 2020-01-01 09:57:01 72 high
58 2020-01-01 09:58:01 73 high
59 2020-01-01 09:59:01 72 high
60 2020-01-01 10:00:01 73 high
[61 rows x 3 columns], Time Value Category
61 2020-01-01 10:01:01 72 high
62 2020-01-01 10:02:01 73 high
63 2020-01-01 10:03:01 73 high
64 2020-01-01 10:04:01 70 high
65 2020-01-01 10:05:01 72 high
.. ... ... ...
117 2020-01-01 10:57:01 73 high
118 2020-01-01 10:58:01 72 high
119 2020-01-01 10:59:01 72 high
120 2020-01-01 11:00:01 72 high
121 2020-01-01 11:01:01 73 high
[61 rows x 3 columns], Time Value Category
122 2020-01-01 11:02:01 69 medium
123 2020-01-01 11:03:01 72 high
124 2020-01-01 11:04:01 72 high
125 2020-01-01 11:05:01 73 high
126 2020-01-01 11:06:01 73 high
127 2020-01-01 11:07:01 72 high
128 2020-01-01 11:08:01 72 high
129 2020-01-01 11:09:01 70 high
130 2020-01-01 11:10:01 72 high
131 2020-01-01 11:11:01 73 high
132 2020-01-01 11:12:01 73 high
133 2020-01-01 11:13:01 72 high
134 2020-01-01 11:14:01 73 high
135 2020-01-01 11:15:01 73 high
136 2020-01-01 11:16:01 72 high
137 2020-01-01 11:17:01 72 high
138 2020-01-01 11:18:01 72 high
139 2020-01-01 11:19:01 70 high
140 2020-01-01 11:20:01 73 high
141 2020-01-01 11:21:01 73 high
142 2020-01-01 11:22:01 72 high
143 2020-01-01 11:23:01 70 high
144 2020-01-01 11:24:01 72 high
145 2020-01-01 11:25:01 72 high
146 2020-01-01 11:26:01 72 high
147 2020-01-01 11:27:01 73 high
148 2020-01-01 11:28:01 72 high
149 2020-01-01 11:29:01 73 high
150 2020-01-01 11:30:01 73 high
151 2020-01-01 11:31:01 68 medium
152 2020-01-01 11:32:01 70 high
153 2020-01-01 11:33:01 72 high
154 2020-01-01 11:34:01 72 high
155 2020-01-01 11:35:01 73 high
156 2020-01-01 11:36:01 72 high
157 2020-01-01 11:37:01 72 high
158 2020-01-01 11:38:01 72 high
159 2020-01-01 11:39:01 69 medium
160 2020-01-01 11:40:01 72 high
161 2020-01-01 11:41:01 72 high
162 2020-01-01 11:42:01 72 high
163 2020-01-01 11:43:01 73 high
164 2020-01-01 11:44:01 73 high
165 2020-01-01 11:45:01 73 high
166 2020-01-01 11:46:01 72 high
167 2020-01-01 11:47:01 72 high
168 2020-01-01 11:48:01 72 high
169 2020-01-01 11:49:01 73 high
170 2020-01-01 11:50:01 70 high
171 2020-01-01 11:51:01 72 high
172 2020-01-01 11:52:01 75 high
173 2020-01-01 11:53:01 70 high
174 2020-01-01 11:54:01 72 high
175 2020-01-01 11:55:01 72 high
176 2020-01-01 11:56:01 72 high
177 2020-01-01 11:57:01 69 medium
178 2020-01-01 11:58:01 72 high
179 2020-01-01 11:59:01 73 high
180 2020-01-01 12:00:01 73 high]
And dfs_midpoint
[Category
high 72.142857
medium 68.400000
Name: Value, dtype: float64, Category
high 72.189655
medium 68.333333
Name: Value, dtype: float64, Category
high 72.163636
medium 68.750000
Name: Value, dtype: float64]
To access elements of the dfs_midpoint
, keep in mind it's a list of numpy.ndarrays
so if I wanted the value of high
from the first group:
dfs_midpoint[0]['high']
> 72.14285714285714
# or
dfs_midpoint[0][0]
> 72.14285714285714