0

I am new to Python and I am trying to create a database that is similar to [this][1] using pandas.

The head of h3_15mins looks like this:

     Timestamp              Kettle  Electric Heater Laptop  Projector       Aggregated Energy Consumption
0    2013-02-27 00:00:00    2.0     2.0             0.03    201.289993  4.36
1    2013-02-27 00:15:00    2.0     2.0             0.03    210.070007  4.38
2    2013-02-27 00:30:00    2.0     2.0             0.02    207.779999  4.35
3    2013-02-27 00:45:00    2.0     2.0             0.03    151.960007  4.34
4    2013-02-27 01:00:00    2.0     2.0             0.04    0.000000    4.41
... ... ... ... ... ... ...
3836  2013-04-07 23:00:00   NaN     NaN             NaN     NaN         NaN
3837  2013-04-07 23:15:00   NaN     NaN             NaN     NaN         NaN
3838  2013-04-07 23:30:00   NaN     NaN             NaN     NaN         NaN
3839  2013-04-07 23:45:00   NaN     NaN             NaN     NaN         NaN
3840  2013-04-08 00:00:00   NaN     NaN             NaN     NaN         NaN
3841 rows × 6 columns

Any idea on how I can use h3_15min to get a result similar to the output below? M (morning), A (afternoon), E (evening) should be assigned if the appliance is active during the respective time (e.g. M = 06:00-12:00, A = 12:00-18:00, E = 18:00-06:00) in h3_15mins.

    Appliance           Hour (of Day)   Time (of Day)
0   Kettle              11:15           "M,A,E"
1   Electric Heater     16:00           "M"
2   Laptop              11:45           "A"
3   Projector           21:00           "M,A"

I managed to get the Hour (of Day) column filled using this code

# Create a new empty dataframe to store the appliance time association data
h3_clustering_I = pd.DataFrame(columns = ["Appliance", "Hour (of Day)", "Time (of Day)"])
 
# Add the appliances
listOfSeries = [pd.Series(["Kettle", 0, 0 ], index = h3_clustering_I.columns),
                pd.Series(["Electric Heater", 0, 0], index = h3_clustering_I.columns),
                pd.Series(["Laptop", 0, 0], index = h3_clustering_I.columns),
                pd.Series(["Projector", 0, 0], index = h3_clustering_I.columns)]
h3_clustering_I.append(listOfSeries, ignore_index = True)

This resulted in the this:

    Appliance           Hour (of Day)   Time (of Day)
0   Kettle              0               0
1   Electric Heater     0               0
2   Laptop              0               0
3   Projector           0               0

Then I created the code below to find the most common time active hour

# Create a new variable for the appliance with 15min interval
h3_kettle_15min = h3_15min.loc[:, ("Timestamp", "Kettle")]
h3_electricheater_15min = h3_15min.loc[:, ("Timestamp", "Electric Heater")]
h3_laptop_15min = h3_15min.loc[:, ("Timestamp", "Laptop")]
h3_projector_15min = h3_15min.loc[:, ("Timestamp", "Projector")]

# Only keep the hours
h3_kettle_15min["Timestamp"] = h3_kettle_15min["Timestamp"].dt.time
h3_electricheater_15min["Timestamp"] = h3_electricheater_15min["Timestamp"].dt.time
h3_laptop_15min["Timestamp"] = h3_laptop_15min["Timestamp"].dt.time
h3_projector_15min["Timestamp"] = h3_projector_15min["Timestamp"].dt.time

# Drop the datetimes where the appliances is missing values
h3_kettle_15min_complete = h3_kettle_15min.dropna(subset=["Kettle"])
h3_electricheater_15min_complete = h3_electricheater_15min.dropna(subset=["Electric Heater"])
h3_laptop_15min_complete = h3_laptop_15min.dropna(subset=["Laptop"])
h3_projector_15min_complete = h3_projector_15min.dropna(subset=["Projector"])

# Drop the datetimes where the appliances is not active
h3_kettle_15min_complete = h3_kettle_15min_complete[h3_kettle_15min_complete["Kettle"] !=0]
h3_electricheater_15min_complete = h3_electricheater_15min_complete[h3_electricheater_15min_complete["Electric Heater"] !=0]
h3_laptop_15min_complete = h3_laptop_15min_complete[h3_laptop_15min_complete["Laptop"] !=0]
h3_projector_15min_complete = h3_projector_15min_complete[h3_projector_15min_complete["Projector"] !=0]

# Get the max counts
h3_kettle_ch = h3_kettle_15min_complete["Timestamp"].value_counts().idxmax()
h3_electricheater_ch = h3_electricheater_15min_complete["Timestamp"].value_counts().idxmax()
h3_laptop_ch = h3_laptop_15min_complete["Timestamp"].value_counts().idxmax()
h3_projector_ch = h3_projector_15min_complete["Timestamp"].value_counts().idxmax()

I added these results to h3_clustering_I using the following code

# Add the appliances, common hour and time to the right column of the dataframe using variables
listOfSeries = [pd.Series(["Kettle", h3_kettle_ch, 0 ], index = h3_clustering_I.columns),
                pd.Series(["Electric Heater", h3_electricheater_ch, 0], index = h3_clustering_I.columns),
                pd.Series(["Laptop", h3_laptop_ch, 0], index = h3_clustering_I.columns),
                pd.Series(["Projector", h3_projector_ch, 0], index = h3_clustering_I.columns)]
h3_clustering_I.append(listOfSeries, ignore_index = True)

Appliance Hour (of Day) Time (of Day) 0 Kettle 05:45:00 0 1 Electric Heater 05:45:00 0 2 Laptop 13:30:00 0 3 Projector 17:45:00 0

  [1]: https://drive.google.com/file/d/1N3yvjdnMju5OsPAu-uVzRd4qpi8tLvo0/view?usp=sharing
  [2]: https://stackoverflow.com/questions/39370879/extract-hour-from-timestamp-with-python?noredirect=1&lq=1
  [3]: https://stackoverflow.com/questions/36921951/truth-value-of-a-series-is-ambiguous-use-a-empty-a-bool-a-item-a-any-o
MxGr20
  • 77
  • 6
  • That error says you are comparing a series with a number. not a number with a number. That series can have multiple values right? so how can you compare with a single value? – Pygirl Mar 17 '21 at 18:10
  • Can you give me a clear sample data coz I am getting confused. Your final expected output has column `Hour (of Day)` but the original one has two columns `hour` and `time` – Pygirl Mar 17 '21 at 18:18
  • @Pygirl I thought that by using a for loop for each row it would be possible to make a comparison with a datetime and a single value and add the M, A or A if the datetime falls between a certain timeframe. The available inputs individual appliance, in this example house3_kettle and a dataset with all the appliances, house3. Hour (of Day) should contain the 15-min time resolution that is most common for that appliance (not null df3). Time (of Day) is based on the 15-min time resolution as well. If an appliance is active between 06:00 and 12:00 it should be assigned an M. Does this help? – MxGr20 Mar 17 '21 at 19:24
  • @Pygirl I managed to get the first results for the hours! I have added the code below. Hopefully this give a better understanding of what I try to accomplish – MxGr20 Mar 18 '21 at 12:52
  • I can't see the updated code. – Pygirl Mar 18 '21 at 12:52
  • @Pygirl it should be there, I had to refresh it :) – MxGr20 Mar 18 '21 at 12:56
  • Just give before(Input) and after(output) DataFrame then It would be easier for me to provide you the solution. – Pygirl Mar 18 '21 at 12:57
  • @Pygirl I hope I made it more clear for you now! I removed a lot of noise – MxGr20 Mar 18 '21 at 13:05
  • Lets have discussion after 2 hrs(I will be back to home). Right now I am in ofc :) then let's work together on this problem :) – Pygirl Mar 18 '21 at 13:07
  • 1
    @Pygirl amazing! In the meantime I will try to make my code less 'copy + paste'. I am very new to Python but it should be possible to put it somehow in a loop so that I don't need to copy paste and manually have to change the appliance all the time. I have a different dataset with 50 appliances instead of 4 so that is though – MxGr20 Mar 18 '21 at 13:10
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/230091/discussion-between-pygirl-and-mxgr20). – Pygirl Mar 18 '21 at 15:48

0 Answers0