0

I have a df in python,which has a column 'close date'.And it has pulled few values in excel date format.For eg(index 3 ,11 and 12) Is there a python query to convert them to python date. Basicaly need the whole column to be in datetime format. As they are for index (1,2,4,5,6,7)

Index   Close Date

1       12/6/2011 0:00

2       1/28/2011 0:00

3       40571

4       12/28/2010 0:00

5       12/13/2010 0:00

6       11/15/2011 0:00

7       8/24/2011 0:00

11      40550

12      40550

Miss
  • 1

2 Answers2

1

TL;DR:

Use..

from datetime import datetime

df['Parsed_Date2'] = [datetime.fromordinal(datetime(1900, 1, 1).toordinal() + int(x) - 2) \
                         if '/' not in x else pd.to_datetime(x) for x in df['Close Date']]

Bit more thinking... This is for if you have your column in Pandas already and want an option to deal with an existing column (of strings); if you're reading in an Excel file other options may work at the time of accessing the file.

Code:

import pandas as pd
from datetime import datetime

# create a test dataframe
df = pd.DataFrame({'Close Date': ['12/6/2011 0:00', '1/28/2011 0:00', '40571', '12/28/2010 0:00', '12/13/2010 0:00', '11/15/2011 0:00', '8/24/2011 0:00', '40550', '40550']})

# create a function to parse the 'Close Date' column
def excelDate(x):
    #check if x is an Excel number (like 40571) and process accordingly
    if '/' not in x:
        x = datetime.fromordinal(datetime(1900, 1, 1).toordinal() + int(x) - 2)
    #if it's not an Excel number just parse it as normal
    else:
        x = pd.to_datetime(x)
    return x

# apply the function above to an new column 'Parsed_Date' for comparison
df['Parsed_Date'] = df.apply(lambda x: excelDate(x['Close Date']), axis=1)

print(df)

Outputs:

        Close Date      Parsed_Date
0   12/6/2011 0:00      2011-12-06
1   1/28/2011 0:00      2011-01-28
2            40571      2011-01-28
3  12/28/2010 0:00      2010-12-28
4  12/13/2010 0:00      2010-12-13
5  11/15/2011 0:00      2011-11-15
6   8/24/2011 0:00      2011-08-24
7            40550      2011-01-07
8            40550      2011-01-07

Alternatives

If you don't like apply (very often slower)...

# Another Way To Parse: do the above code but as a list comprehension and without apply and a separate function
df['Parsed_Date2'] = [datetime.fromordinal(datetime(1900, 1, 1).toordinal() + int(x) - 2) \
                     if '/' not in x else pd.to_datetime(x) for x in df['Close Date']]

...or importing an extra module (datetime)...

# Yet Another Way To Parser: use @MrFuppes suggestion so you don't need to import datetime
df['Parsed_Date3'] = [pd.to_datetime(int(x), unit='D', origin='1899-12-30') \
                    if '/' not in x else pd.to_datetime(x) for x in df['Close Date']]

If you ran all of them (why?, why not!¬) you would output:

    Close Date          Parsed_Date     Parsed_Date2    Parsed_Date3
0   12/6/2011 0:00      2011-12-06      2011-12-06      2011-12-06
1   1/28/2011 0:00      2011-01-28      2011-01-28      2011-01-28
2   40571               2011-01-28      2011-01-28      2011-01-28
3   12/28/2010 0:00     2010-12-28      2010-12-28      2010-12-28
4   12/13/2010 0:00     2010-12-13      2010-12-13      2010-12-13
5   11/15/2011 0:00     2011-11-15      2011-11-15      2011-11-15
6   8/24/2011 0:00      2011-08-24      2011-08-24      2011-08-24
7   40550               2011-01-07      2011-01-07      2011-01-07
8   40550               2011-01-07      2011-01-07      2011-01-07

Timings:

Apply:                          2.45 ms ± 147 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Using datetime in list comp:    1.68 ms ± 21.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Using just pandas in list comp: 2.28 ms ± 77.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
MDR
  • 2,610
  • 1
  • 8
  • 18
  • note that `pd.to_datetime` takes a kwarg `origin` - which you can use to process excel dates, see e.g. [here](https://stackoverflow.com/a/65460255/10197418). – FObersteiner Jul 26 '21 at 06:40
0

first, try to convert the column to DateTime when you upload the data something like this

df1 = pd.read_excel(file, converters= {'COLUMN': pd.to_datetime})

if this don't work you can you .apply to columns

 ### this is really ugly code but it's just to give you an idea. 
 ### im sure you can make it better. 

def convertTime(date):
    try:
        return datetime.datetime.strptime(date, "%m/%d/%Y %H:%M") 
    except:
        return datetime.datetime.fromtimestamp(date) # I think the second is timestamp... 

db['date'] = db['date].apply(convertime)
   
Cesar Rodriguez
  • 192
  • 1
  • 15