1

I have a CSV file with thousands of rows. The file has 3 columns Date, Time and Value. I want to first loop through the date column and then the time column then add the value between two particular time. Is there any function in Pandas to achieve this? Below is my sample CSV. The dates are not continuous but they are in ascending order.

Date    Time    Value
29-Jan-18   11:00   10
29-Jan-18   12:00   11
29-Jan-18   13:00   12
29-Jan-18   14:00   13
31-Jan-18   11:00   90
31-Jan-18   12:00   91
31-Jan-18   13:00   92
31-Jan-18   14:00   93

I am looking for output as "For 29-Jan-18 and for time 11 to 13, the sum of the values is 33."

Not exactly 11-12 but there is a series. The calculation will be done for 9-10,10-11,11-13,13-15 but in the CSV file, every hour is present.

cs95
  • 379,657
  • 97
  • 704
  • 746
user3198755
  • 477
  • 2
  • 10
  • 21
  • The time, do you do 11:12, then 12:13, then 13:14 or?? – Onyambu Aug 08 '18 at 15:17
  • Not exactly 11-12 but there is a series. The calculation will be done for 9-10,10-11,11-13,13-15 but in the csv file every hour is present. – user3198755 Aug 08 '18 at 15:22
  • mhh, what is your rolling window? 2 or 3, because you have 9:10,10:11 which has a rolling window of 2 yet you also have 11:13 and here the rolling window is 3.. Why the change? why not 11:12 – Onyambu Aug 08 '18 at 15:23
  • can you include the output you expect? – Onyambu Aug 08 '18 at 15:24
  • @Onyambu That's the challenge that I am facing. The only fixed timeframes everyday is 9-10,10-11,11-13,13-15 – user3198755 Aug 08 '18 at 15:28
  • I have updated the question with the output for clarification – user3198755 Aug 08 '18 at 15:29
  • Well then your question is not clear enough. Include that as a challenge in your question then give a comprehensive data example that will include the challenge and the output that you need – Onyambu Aug 08 '18 at 15:29
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/177658/discussion-between-user3198755-and-onyambu). – user3198755 Aug 08 '18 at 15:30
  • You have to include the expected answer to the whole column and how it should be presented. Ie should 33 for example be in row 1 or 2 or 3 etc... Just include the answer column, in its entirety in the question – Onyambu Aug 08 '18 at 15:31
  • @Onyambu - The answer need not be always as a column. It can also be in the form of a new dataframe or just a print statement - calculating and just streaming the output using print. I have already provided how the output should look like – user3198755 Aug 08 '18 at 15:35
  • Will you ever need to go across multiple days? Or do you always specify a single day and a range of hours within that single day? – Matt Messersmith Aug 08 '18 at 15:36
  • @MattMessersmith I have to go across multiple days but the range of hours 9-10,10-11,11-13,13-15 is fixed in a single day – user3198755 Aug 08 '18 at 15:40
  • Closing this as duplicate for the 2k people who arrived at this question, but found no answer. – cs95 Jun 17 '20 at 05:00

2 Answers2

0

Main idea is to beat your date/time columns into datetime objects, and then do the date filtering with those objects. I assumed they were string to start.

I'm not sure how your dates are incoming, but hopefully you can figure out how to create datetime objects out of them. If not, let me know, and we can update. Here's an example of how you could filter by date and sum the values over a start/end date:

import numpy as np
import pandas as pd
import datetime

cols = ["Date", "Time", "Value"]

rows = [["29-Jan-18", "11:00", 10],
        ["29-Jan-18", "12:00", 11],
        ["29-Jan-18", "13:00", 12],
        ["29-Jan-18", "14:00", 13],
        ["31-Jan-18", "11:00", 90],
        ["31-Jan-18", "12:00", 91],
        ["31-Jan-18", "13:00", 92],
        ["31-Jan-18", "14:00", 93]]

df = pd.DataFrame(rows, columns=cols)

df['datetime_str'] = df['Date'].str.cat(df['Time'].astype(str), sep=" ")
df['datetime'] = df['datetime_str'].apply(lambda d: datetime.datetime.strptime(d, '%d-%b-%y %H:%M'))
print(df)
print()

# Hopefully you can figure out how to beat incoming start/end date into datetime.
# If not, let me know how the start/end are coming in and we can write a function to do it
start_date = datetime.datetime.strptime("29-Jan-18 11:00", '%d-%b-%y %H:%M')
end_date = datetime.datetime.strptime("29-Jan-18 13:00", '%d-%b-%y %H:%M')

value_sum = df[ (df['datetime'] >= start_date) & (df['datetime'] <= end_date) ]['Value'].sum()
print("Value sum from " + str(start_date) + " to " + str(end_date) + ": ", value_sum)

# Works accross days as well                                                                                                                                                                                
start_date = datetime.datetime.strptime("29-Jan-18 13:00", '%d-%b-%y %H:%M')
end_date = datetime.datetime.strptime("31-Jan-18 13:00", '%d-%b-%y %H:%M')

value_sum = df[ (df['datetime'] >= start_date) & (df['datetime'] <= end_date) ]['Value'].sum()
print("Value sum from " + str(start_date) + " to " + str(end_date) + ": ", value_sum)

This outputs:

        Date   Time  Value     datetime_str            datetime
0  29-Jan-18  11:00     10  29-Jan-18 11:00 2018-01-29 11:00:00
1  29-Jan-18  12:00     11  29-Jan-18 12:00 2018-01-29 12:00:00
2  29-Jan-18  13:00     12  29-Jan-18 13:00 2018-01-29 13:00:00
3  29-Jan-18  14:00     13  29-Jan-18 14:00 2018-01-29 14:00:00
4  31-Jan-18  11:00     90  31-Jan-18 11:00 2018-01-31 11:00:00
5  31-Jan-18  12:00     91  31-Jan-18 12:00 2018-01-31 12:00:00
6  31-Jan-18  13:00     92  31-Jan-18 13:00 2018-01-31 13:00:00
7  31-Jan-18  14:00     93  31-Jan-18 14:00 2018-01-31 14:00:00

Value sum from 2018-01-29 11:00:00 to 2018-01-29 13:00:00:  33
Value sum from 2018-01-29 13:00:00 to 2018-01-31 13:00:00:  298
Matt Messersmith
  • 12,939
  • 6
  • 51
  • 52
  • This looks great but the challenging part is looping through the date. They are in ascending order but not continuous. It's something like 1st Jan, 2nd Jan, 3rd Jan, 6th Jan, 8th Jan,9th Jan,10th Jan etc – user3198755 Aug 09 '18 at 02:23
  • I'm not sure I follow: the dates here aren't continuous either. The code above will work over multiple days, regardless of continuity of dates. I've added an example. – Matt Messersmith Aug 09 '18 at 10:41
0

You can achieve this through the use of the resample method.

First you need to merge your date column and time column to create a single date time index. Assuming your two columns are strings (if they are not you can call as_type(str) on them), you can concatenate the columns, convert them to a datetime column, and convert that datetime column to a DatetimeIndex for the DataFrame.

Then call resample with the rule set to '3H' to create bins 3 hours in width and set base=11 to start at 11am (of course you can change this if you want/need to) and sum on the DataFrame.

Here is an example:

import pandas as pd
import datetime as dt

df.index = pd.DatetimeIndex(pd.to_datetime(df.Date.str.cat(df.Time, sep=' ')))
df = df.resample('1H').sum() # to fill missing values
# get values for 9 & 10
df1 = df[(dt.time(9) <= df.index.time) & (df.index.time <= dt.time(10))]
# get values for 11-13 & 13-15
two_hour_ranges = df[df.index.time >= dt.time(11)].resample('2H', base=11).sum()
df2 = two_hour_ranges[(dt.time(11) <= two_hour_ranges.index.time) & (two_hour_ranges.index.time < dt.time(15))]

# merge
df = pd.concat([df1, df2]).sort_index()

And sample output for df using your given data as input (with values for 9:00 & 10:00 on 1/29 and 1/31 added by me):

                     Value
Date                     
2018-01-29 09:00:00      4
2018-01-29 10:00:00      5
2018-01-29 11:00:00     21
2018-01-29 13:00:00     25
2018-01-30 09:00:00      0
2018-01-30 10:00:00      0
2018-01-30 11:00:00      0
2018-01-30 13:00:00      0
2018-01-31 09:00:00     70
2018-01-31 10:00:00     80
2018-01-31 11:00:00    181
2018-01-31 13:00:00    185

Here are the docs on resample for more: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.resample.html

Henry Woody
  • 14,024
  • 7
  • 39
  • 56
  • This would work if the time is 3 hours every time but the range of hours is 9-10,10-11,11-13,13-15 and this is fixed for every day. – user3198755 Aug 09 '18 at 02:20