2

I read in an Excel file which contains a column of time values like so:

    Time
1   3:00a
2   
3   3:30a
4   
5   4:00a
6   
7   4:30a
8   
9   5:00a
10  
11  5:30a
12  
13  6:00a
14  
15  6:30a
16  
17  7:00a
18  
19  7:30a
20  
21  8:00a
22  
23  8:30a
24  
25  9:00a
26  
27  9:30a
28  
29  10:00a
30  
31  10:30a
32  
33  11:00a
34  
35  11:30a
36  
37  12:00p
38  
39  12:30p
40  
41  1:00p
42  
43  1:30p
44  
45  2:00p
46  
47  2:30p
48  
49  3:00p
50  
51  3:30p
52  
53  4:00p
54  
55  4:30p
56  
57  5:00p
58  
59  5:30p
60  
61  6:00p
62  
63  6:30p
64  
65  7:00p
66  
67  7:30p
68  
69  8:00p
70  
71  8:30p
72  
73  9:00p
74  
75  9:30p
76  
77  10:00p
78  
79  10:30p
80  
81  11:00p
82  
83  11:30p
84  
85  12:00a
86  
87  12:30a
88  
89  1:00a
90  
91  1:30a
92  
93  2:00a
94  
95  2:30a

To add additional clarification:

enter image description here

I can also read the date from the filename, so I get for example 012622. It is in MMDDYY format in a string.

I am looking to accomplish two things:

  1. Convert the column to pd.datetime format

  2. Fill in the "blank" spaces with the 15 minute intervals, producing a column that looks like

    | 3:00:00 | | 3:15:00 | | 3:30:00 | | 3:45:00 | | 4:00:00 | | 4:15:00 | | 4:30:00 | | 4:45:00 | | 5:00:00 |

etc...

I have tried to do this first by referencing this question, however:

n = pd.read_clipboard()
new_n = pd.to_timedelta(n+':00')

Results in:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Input In [42], in <module>
----> 1 new_n = pd.to_timedelta(n.astype('str')+':00')

File filepath\_venv\lib\site-packages\pandas\core\tools\timedeltas.py:134, in to_timedelta(arg, unit, errors)
    132     return _convert_listlike(arg, unit=unit, errors=errors)
    133 elif getattr(arg, "ndim", 1) > 1:
--> 134     raise TypeError(
    135         "arg must be a string, timedelta, list, tuple, 1-d array, or Series"
    136     )
    138 if isinstance(arg, str) and unit is not None:
    139     raise ValueError("unit must not be specified if the input is/contains a str")

TypeError: arg must be a string, timedelta, list, tuple, 1-d array, or Series

I did find this question which seems to help reference getting the column into a quarter-hour format, however, it needs to be a datetime first.

artemis
  • 6,857
  • 11
  • 46
  • 99
  • 1
    Your filename contains the date? – Corralien Jan 26 '22 at 20:23
  • @Corralien Correct, it is a daily drop of an Excel file that I am running and parsing every day, think of something like: `drop_012622.xlsx` – artemis Jan 26 '22 at 20:37
  • 1
    Note that "a" and "p" are no valid additions to the time. A valid time format would be '03:00 PM'. So first fix your time strings to a correct format. Then you could use `datetime.datetime.strptime('3:00 PM', '%I:%M %p')` to convert the string to datetime. After that you can work on filling the blanks. – 576i Jan 26 '22 at 20:45
  • 1
    when loading your data in a dataframe with read_excel, the blank cell becomes nan, right? – Ben.T Jan 26 '22 at 20:46
  • But the first is _not_ pm, it is am. I understand that it is not valid, that is the entire point of the question is helping to convert the _current_ format to a _proper_ format – artemis Jan 26 '22 at 20:46
  • 1
    The first line is "3:00 AM". Which is just like "3:00" so it's a bad test case... – 576i Jan 26 '22 at 20:46
  • @Ben.T correct, `df.dropna()` removes those rows. – artemis Jan 26 '22 at 20:46
  • @576i that is literally a drop of the data; I'm not sure how a more specific test case could be built being that it is literally a copy/paste of the problem I am trying to resolve. – artemis Jan 26 '22 at 20:47
  • Related: [Custom pandas Datetime offset / frequency for trading times: 9:30AM-4:00PM business days except US holidays](https://stackoverflow.com/questions/44533880/custom-pandas-datetime-offset-frequency-for-trading-times-930am-400pm-busin) – smci Jan 26 '22 at 21:03

1 Answers1

4

IIUC. You have just to recreate the starting point. With freq (15 minutes) and periods (length of dataframe), you can create the DatetimeIndex

date = '012622'  # extract the date from filename here
start = pd.to_datetime(f"{date} {df['Time'].iloc[0]}m", format='%m%d%y %I:%M%p')
df['Time'] = pd.date_range(start, freq='15T', periods=len(df))

Output:

>>> df
                  Time
1  2022-01-26 03:00:00
2  2022-01-26 03:15:00
3  2022-01-26 03:30:00
4  2022-01-26 03:45:00
5  2022-01-26 04:00:00
..                 ...
91 2022-01-27 01:30:00
92 2022-01-27 01:45:00
93 2022-01-27 02:00:00
94 2022-01-27 02:15:00
95 2022-01-27 02:30:00

[95 rows x 1 columns]
Corralien
  • 109,409
  • 8
  • 28
  • 52