0

I have this data frame:

data_time = {'DateTime': ['2019-01-01 00:32:39-04:00','2019-01-01 00:34:52-04:00','2019-01-01 01:01:02-04:00','2019-01-01 01:05:10-04:00','2019-01-01 01:01:11-04:00'],
             'Bldg_Code': ['BLDG_A','BLDG_B','BLDG_C','BLDG_A','BLDG_E'],
             'Bldg_Name': ['Building A','Building B','Building C','Building A','Building D']}
                    DateTime Bldg_Code   Bldg_Name
0  2019-01-01 00:32:39-04:00    BLDG_A  Building A
1  2019-01-01 00:34:52-04:00    BLDG_B  Building B
2  2019-01-01 01:01:02-04:00    BLDG_C  Building C
3  2019-01-01 01:05:10-04:00    BLDG_A  Building A
4  2019-01-01 01:01:11-04:00    BLDG_E  Building D

I want to group them by 15 minutes intervals, while also showing other time intervals without any data. Here's the result that I'm looking for:

                     DateTime Bldg_Code   Bldg_Name  count
0   2019-01-01 00:00:00-04:00    BLDG_A  Building A      0
1   2019-01-01 00:00:00-04:00    BLDG_B  Building B      0
2   2019-01-01 00:00:00-04:00    BLDG_C  Building C      0
3   2019-01-01 00:00:00-04:00    BLDG_D  Building D      0
4   2019-01-01 00:15:00-04:00    BLDG_A  Building A      0
5   2019-01-01 00:15:00-04:00    BLDG_B  Building B      0
6   2019-01-01 00:15:00-04:00    BLDG_C  Building C      0
7   2019-01-01 00:15:00-04:00    BLDG_D  Building D      0
8   2019-01-01 00:30:00-04:00    BLDG_A  Building A      1
9   2019-01-01 00:30:00-04:00    BLDG_B  Building B      1
10  2019-01-01 00:30:00-04:00    BLDG_C  Building C      0
11  2019-01-01 00:30:00-04:00    BLDG_D  Building D      0
12  2019-01-01 00:45:00-04:00    BLDG_A  Building A      0
13  2019-01-01 00:45:00-04:00    BLDG_B  Building B      0
14  2019-01-01 00:45:00-04:00    BLDG_C  Building C      0
15  2019-01-01 00:45:00-04:00    BLDG_D  Building D      0
16  2019-01-01 01:00:00-04:00    BLDG_A  Building A      1
17  2019-01-01 01:00:00-04:00    BLDG_B  Building B      0
18  2019-01-01 01:00:00-04:00    BLDG_C  Building C      1
19  2019-01-01 01:00:00-04:00    BLDG_D  Building D      1

Here's my code so far:

data_time_df.groupby([
    pd.Grouper(freq='15T', key='DateTime'),
    'Bldg_Code',
    'Bldg_Name'
]).agg(total_swipes=('DateTime', 'count')).sort_values('DateTime')

As you can see, my code only groups and shows existing data only.

                                                total_swipes
DateTime                  Bldg_Code Bldg_Name               
2019-01-01 00:30:00-04:00 BLDG_A    Building A             1
                          BLDG_B    Building B             1
2019-01-01 01:00:00-04:00 BLDG_A    Building A             1
                          BLDG_C    Building C             1
                          BLDG_E    Building D             1
wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • It's a lot easier to read the data as a table, so I added it to your question. IMO that makes the `result` dict redundant so I removed it, but by all means add it back if you disagree. For related advice, check out [How to make good reproducible pandas examples](/q/20109391/4518341). – wjandrea Oct 03 '22 at 18:30
  • Why bother including half an hour with no data? Does the result have to align to hours? If so, then shouldn't 01:15-01:45 be included? – wjandrea Oct 03 '22 at 18:43
  • Hi, what do you mean by half an hour with no data? Yes, 01:15, 01:30, 01:45, and every other 15 minutes interval until the latest available date time (let's say 2019-12-31 23:45:00-04:00) should exist in the data frame, even though they don't have any total_swipes value. – mhutagalung Oct 03 '22 at 18:53
  • What I mean is, the top part of the data frame has all 0 counts, so you might as well omit it. And where your actual data is a whole year, if it's just as sparse as this example, there's going to be thousands of rows with 0 counts. Do you actually need them? I mean, what do you need the result for exactly? – wjandrea Oct 03 '22 at 19:02
  • Oh okay, gotcha. The answer to that is yes, I do need the 0 data as well. Swipes mean people coming to the building, and since I don't have the data on when each people leave the building, I'm planning to look at the amount of electricity used every 15 minutes in relation to the number of swipes. So, there might be cases where the electricity used is high, but the swipes are low or even 0, which means there are a lot of people staying inside the building. Hope that makes sense. – mhutagalung Oct 03 '22 at 19:16
  • Hmm... So you already have a dataframe of electricity usage? For putting them together, you could just merge your groupby onto that existing dataframe, no? – wjandrea Oct 03 '22 at 19:25

0 Answers0