I am new to Python and I am trying to create a database that is similar to [this][1] using pandas.
The head of h3_15mins looks like this:
Timestamp Kettle Electric Heater Laptop Projector Aggregated Energy Consumption
0 2013-02-27 00:00:00 2.0 2.0 0.03 201.289993 4.36
1 2013-02-27 00:15:00 2.0 2.0 0.03 210.070007 4.38
2 2013-02-27 00:30:00 2.0 2.0 0.02 207.779999 4.35
3 2013-02-27 00:45:00 2.0 2.0 0.03 151.960007 4.34
4 2013-02-27 01:00:00 2.0 2.0 0.04 0.000000 4.41
... ... ... ... ... ... ...
3836 2013-04-07 23:00:00 NaN NaN NaN NaN NaN
3837 2013-04-07 23:15:00 NaN NaN NaN NaN NaN
3838 2013-04-07 23:30:00 NaN NaN NaN NaN NaN
3839 2013-04-07 23:45:00 NaN NaN NaN NaN NaN
3840 2013-04-08 00:00:00 NaN NaN NaN NaN NaN
3841 rows × 6 columns
Any idea on how I can use h3_15min to get a result similar to the output below? M (morning), A (afternoon), E (evening) should be assigned if the appliance is active during the respective time (e.g. M = 06:00-12:00, A = 12:00-18:00, E = 18:00-06:00) in h3_15mins.
Appliance Hour (of Day) Time (of Day)
0 Kettle 11:15 "M,A,E"
1 Electric Heater 16:00 "M"
2 Laptop 11:45 "A"
3 Projector 21:00 "M,A"
I managed to get the Hour (of Day) column filled using this code
# Create a new empty dataframe to store the appliance time association data
h3_clustering_I = pd.DataFrame(columns = ["Appliance", "Hour (of Day)", "Time (of Day)"])
# Add the appliances
listOfSeries = [pd.Series(["Kettle", 0, 0 ], index = h3_clustering_I.columns),
pd.Series(["Electric Heater", 0, 0], index = h3_clustering_I.columns),
pd.Series(["Laptop", 0, 0], index = h3_clustering_I.columns),
pd.Series(["Projector", 0, 0], index = h3_clustering_I.columns)]
h3_clustering_I.append(listOfSeries, ignore_index = True)
This resulted in the this:
Appliance Hour (of Day) Time (of Day)
0 Kettle 0 0
1 Electric Heater 0 0
2 Laptop 0 0
3 Projector 0 0
Then I created the code below to find the most common time active hour
# Create a new variable for the appliance with 15min interval
h3_kettle_15min = h3_15min.loc[:, ("Timestamp", "Kettle")]
h3_electricheater_15min = h3_15min.loc[:, ("Timestamp", "Electric Heater")]
h3_laptop_15min = h3_15min.loc[:, ("Timestamp", "Laptop")]
h3_projector_15min = h3_15min.loc[:, ("Timestamp", "Projector")]
# Only keep the hours
h3_kettle_15min["Timestamp"] = h3_kettle_15min["Timestamp"].dt.time
h3_electricheater_15min["Timestamp"] = h3_electricheater_15min["Timestamp"].dt.time
h3_laptop_15min["Timestamp"] = h3_laptop_15min["Timestamp"].dt.time
h3_projector_15min["Timestamp"] = h3_projector_15min["Timestamp"].dt.time
# Drop the datetimes where the appliances is missing values
h3_kettle_15min_complete = h3_kettle_15min.dropna(subset=["Kettle"])
h3_electricheater_15min_complete = h3_electricheater_15min.dropna(subset=["Electric Heater"])
h3_laptop_15min_complete = h3_laptop_15min.dropna(subset=["Laptop"])
h3_projector_15min_complete = h3_projector_15min.dropna(subset=["Projector"])
# Drop the datetimes where the appliances is not active
h3_kettle_15min_complete = h3_kettle_15min_complete[h3_kettle_15min_complete["Kettle"] !=0]
h3_electricheater_15min_complete = h3_electricheater_15min_complete[h3_electricheater_15min_complete["Electric Heater"] !=0]
h3_laptop_15min_complete = h3_laptop_15min_complete[h3_laptop_15min_complete["Laptop"] !=0]
h3_projector_15min_complete = h3_projector_15min_complete[h3_projector_15min_complete["Projector"] !=0]
# Get the max counts
h3_kettle_ch = h3_kettle_15min_complete["Timestamp"].value_counts().idxmax()
h3_electricheater_ch = h3_electricheater_15min_complete["Timestamp"].value_counts().idxmax()
h3_laptop_ch = h3_laptop_15min_complete["Timestamp"].value_counts().idxmax()
h3_projector_ch = h3_projector_15min_complete["Timestamp"].value_counts().idxmax()
I added these results to h3_clustering_I using the following code
# Add the appliances, common hour and time to the right column of the dataframe using variables
listOfSeries = [pd.Series(["Kettle", h3_kettle_ch, 0 ], index = h3_clustering_I.columns),
pd.Series(["Electric Heater", h3_electricheater_ch, 0], index = h3_clustering_I.columns),
pd.Series(["Laptop", h3_laptop_ch, 0], index = h3_clustering_I.columns),
pd.Series(["Projector", h3_projector_ch, 0], index = h3_clustering_I.columns)]
h3_clustering_I.append(listOfSeries, ignore_index = True)
Appliance Hour (of Day) Time (of Day) 0 Kettle 05:45:00 0 1 Electric Heater 05:45:00 0 2 Laptop 13:30:00 0 3 Projector 17:45:00 0
[1]: https://drive.google.com/file/d/1N3yvjdnMju5OsPAu-uVzRd4qpi8tLvo0/view?usp=sharing
[2]: https://stackoverflow.com/questions/39370879/extract-hour-from-timestamp-with-python?noredirect=1&lq=1
[3]: https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o