0

I'm loading the excel(xlsx) file exported from dashboard in OpenPyxl and some of date values are defaulting into 44774.375 and it should be converted to 01-08-2022 09:00:00 to load into SF table.

Also In dashboard Users are giving 4/21/2023 10:00:00 PM this format but I should convert into 21-04-2023 10:00:00

not sure how to apply this both formatting condition in columns FE RETA, RETA.

I'm currently using the below code,

import pandas as pd
from openpyxl import load_workbook

wb = load_workbook(filename= "file.xlsx", data_only = True)
sheet_names = wb.get_sheet_names()
name = sheet_names[0]
sheet_ranges = wb[name]
df = pd.DataFrame(sheet_ranges.values, index = none)
print(df)

#To replace (4/21/2023 10:00:00 PM) format to (21-04-2023 10:00:00)
df['FE RETA'] = pd.to_datetime(df['FE RETA'], format = '%m/%d/%Y %I:%M:%S %p', errors='coerce')
df['FE RETA'] =df['FE RETA'].dt.strftime('%d-%m-%Y %H:%m:%S')

#To replace all remaining number format to (21-04-2023 10:00:00) 
#but its only working if the entire column is in number format
df['FE RETA'] = pd.to_datetime(df['FE RETA'],unit='d') + dt.datetime(1899,12,30)
print(df)

but however my date columns having also proper datetime format values too. So it throwing me the error.

TypeError: Invalid type for timedelta scalar: <class 'datetime.datetime'>

Source file:

Tht|       FE RETA               |  RETA                | TPM SLACycletime
---------------------------------------------------------------------------
US |    4/21/2023 10:00:00 PM    |4/21/2023 10:30:00 AM | Invalid Data
US |    4/22/2023 11:45:00 PM    |44781.54167           | 558:19:30
US |    4/21/2023 10:30:00 AM    |10-09-2022 18:03:00   | 111:44:26
US |    01-08-2022 10:00:00      |44778.41667           | 15:44:26
US |    44774.375                |44775.52083           | Invalid Data

Desired Output:

Tht|       FE RETA               |  RETA                | TPM SLACycletime
---------------------------------------------------------------------------
US |    21-04-2023 10:00:00      |21-04-2023 10:30:00   | Invalid Data
US |    22-04-2023 11:45:00      |08-08-2022 13:00:00   | 558:19:30
US |    21-04-2023 10:30:00      |10-09-2022 18:03:00   | 111:44:26
US |    01-08-2022 10:00:00      |05-08-2022 10:00:00   | 15:44:26
US |    01-08-2022 09:00:00      |02-08-2022 12:30:00   | Invalid Data

User is not agreeing to correct it while uploading, he's expecting to clean the date format in automation itself. Please help on this.

Thanks

1 Answers1

1

When converting excel datetimes there is actually a different function: pd.to_timedelta(). To deal with which values should receive which function, I would create a mask and then invert it (or create another mask if there are more than 2 formats).

def func(col):


    #Check for the numerical excel date format
    mask = pd.to_numeric(df[col], errors='coerce').notna()
    
    #Convert excel dates to datetime
    df.loc[mask, col] = pd.to_timedelta(df.loc[mask, col], unit='d') + pd.Timestamp('1899-12-30')
    
    #Convert the string date format to datetime
    df.loc[~mask, col] = pd.to_datetime(df.loc[~mask, col], errors='coerce', format='%m/%d/%Y %I:%M:%S %p')
    #Convert to your desired format
    df[col] =df[col].dt.strftime('%d/%m/%Y %I:%M:%S %p')

Hopefully that works for you

Edit: To help with editing the date formats, use this documentation and change as you like

Alexander
  • 310
  • 1
  • 9
  • Thanks Alex. It's working great. However the time value is not coming as precisely as input file. Like 4/21/2023 10:30:00 AM instead it's coming as 21-04-2023 10:04:00. – Devaraj Mani Maran Aug 01 '23 at 17:18
  • Also Alex, could you please make it as function I need to use it for other columns too. So I can directly call the function. – Devaraj Mani Maran Aug 01 '23 at 17:23
  • @DevarajManiMaran Try that now. To add the PM I believe it's %p in the formatting. As for the time offset, I am not sure what is causing that. Likely weird formatting in the data or pd.to_datetime not behaving as we'd expect. I suggest you debug the code and see which dates are the ones with imprecision errors, and where that may come about. – Alexander Aug 04 '23 at 07:41