It's the first time that I am using a dataset in .h5 format for which I am assigning keys to a new variable using the code below
# Assign every meter to a variable
house3_aggregate = ukdale.get('/building3/elec/meter1')
house3_kettle = ukdale.get('/building3/elec/meter2')
house3_electric_heater = ukdale.get('/building3/elec/meter3')
house3_laptop = ukdale.get('/building3/elec/meter4')
house3_projector = ukdale.get('/building3/elec/meter5')
Printing the first variable gives me the following structure:
print(house3_aggregate)
physical_quantity power
type apparent
2013-02-27 20:35:14+00:00 5.0
2013-02-27 20:35:20+00:00 4.0
2013-02-27 20:35:26+00:00 5.0
2013-02-27 20:35:32+00:00 5.0
2013-02-27 20:35:38+00:00 4.0
... ...
2013-04-08 06:14:22+01:00 168.0
2013-04-08 06:14:28+01:00 171.0
2013-04-08 06:14:34+01:00 176.0
2013-04-08 06:14:40+01:00 174.0
2013-04-08 06:14:53+01:00 3122.0
512327 rows × 1 columns
After using the .columns to get a better understanding of what I am seeing here I get the following output:
MultiIndex([('power', 'apparent')],
names=['physical_quantity', 'type'])
I want to transform this dataset into a dataset that only has two columns like so
Timestamp Energy Consumption
2013-02-27 20:35:14+00:00 5.0
2013-02-27 20:35:20+00:00 4.0
2013-02-27 20:35:26+00:00 5.0
2013-02-27 20:35:32+00:00 5.0
2013-02-27 20:35:38+00:00 4.0
... ...
2013-04-08 06:14:22+01:00 168.0
2013-04-08 06:14:28+01:00 171.0
2013-04-08 06:14:34+01:00 176.0
2013-04-08 06:14:40+01:00 174.0
2013-04-08 06:14:53+01:00 3122.0
512327 rows × 1 columns
I found this post and I tried to implement the code like so
house3_aggregate = house3_aggregate.reset_index()
print(house3_aggregate)
This gives me the following output
physical_quantity index power
type apparent
0 2013-02-27 20:35:14+00:00 5.0
1 2013-02-27 20:35:20+00:00 4.0
2 2013-02-27 20:35:26+00:00 5.0
3 2013-02-27 20:35:32+00:00 5.0
4 2013-02-27 20:35:38+00:00 4.0
... ... ...
512322 2013-04-08 06:14:22+01:00 168.0
512323 2013-04-08 06:14:28+01:00 171.0
512324 2013-04-08 06:14:34+01:00 176.0
512325 2013-04-08 06:14:40+01:00 174.0
512326 2013-04-08 06:14:53+01:00 3122.0
[512327 rows x 2 columns]
Besides the wrong column naming this seems like what I need. Expect that wen I call .column again the data still seems to be a MultiIndex
house3_aggregate.columns
MultiIndex([('index', ''),
('power', 'apparent')],
names=['physical_quantity', 'type'])
I also tried this approach and used the code below
house3_aggregate = house3_aggregate.to_frame()
print(house3_aggregate)
This gives me the following error
AttributeError: 'DataFrame' object has no attribute 'to_frame'
I am confused now as this error makes it seem like the data is already two-dimensional. Any idea on what I should do to get rid of the Multindex and transform the data in a two-dimensional data frame like below? Ultimately I want to use pd.concat to merge all the variables into a single dataset for house3
Timestamp Energy Consumption
2013-02-27 20:35:14+00:00 5.0
2013-02-27 20:35:20+00:00 4.0
2013-02-27 20:35:26+00:00 5.0
2013-02-27 20:35:32+00:00 5.0
2013-02-27 20:35:38+00:00 4.0
... ...
2013-04-08 06:14:22+01:00 168.0
2013-04-08 06:14:28+01:00 171.0
2013-04-08 06:14:34+01:00 176.0
2013-04-08 06:14:40+01:00 174.0
2013-04-08 06:14:53+01:00 3122.0
512327 rows × 1 columns