-1

i want to convert a column in a table from a csv.file that has dates in the format of Excel 5-digits. i used the code below

Cab_Data['Date of Travel'].apply(lambda X: (datetime.utcfromtimestamp(0) + timedelta(int(x)).strftime('%d-%m-%Y'))

i keep getting an error message saying 'incomplete input'. what I do please.

I have tried this code

I expect to the values in the column to be converted to dates. for example:

40055 = 12/3/22 (just an example).
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
  • 1
    your code is correct, but, at the lambda, you're declaring the X on uppercase and using it in lower case at int(x) – skulden Mar 06 '23 at 21:13
  • Thank you for the correction. I tried the code again and got the error message 'TypeError: int() argument must be a string, a bytes-like object or a real number, not 'Timestamp' – imhanlahimi wendy Mar 06 '23 at 21:26
  • 1
    1) The error is saying `x` is already a timestamp. To confirm what does: `print(type(x))` return? 2) `(datetime.utcfromtimestamp(0) + timedelta(40055)).strftime('%d-%m-%Y') '01-09-2079'` – Adrian Klaver Mar 06 '23 at 22:04

1 Answers1

0

Errors in Posted Code

  • Corections to your code (missing parenthesis and use of X rather than x):

    df['Date of Travel'].apply(lambda x:(datetime.utcfromtimestamp(0) + timedelta(days=int(x))).strftime('%d-%m-%Y'))

  • But this gives incorrect dates since:

    • Excel 5 digit timestamp starts in 1900
    • while utcfromtimestamp(0)starts in 1970

Working code

We modify the function from How to convert a given ordinal number (from Excel) to a date by Martijn Pieters to convert from 5 digit Excel to the desired date format.

from datetime import datetime, timedelta

def from_excel_ordinal(ordinal: float, _epoch0=datetime(1899, 12, 31)) -> datetime:
    if ordinal >= 60:
        ordinal -= 1  # Excel leap year bug, 1900 is not a leap year!
    return (_epoch0 + timedelta(days=ordinal)).replace(microsecond=0).strftime('%d-%m-%Y')

Usage

df['Date of Travel'].apply(from_excel_ordinal)

Test

data = {'Date of Travel':[40055, 0, 4200]}
df = pd.DataFrame(data)
result = df['Date of Travel'].apply(from_excel_ordinal)
print(result)

Output

0    30-08-2009   # note: 40055 is Aug. 30, 2009 not March 12, 2022 as states in OP
1    31-12-1899
2    01-07-1911
Name: Date of Travel, dtype: object
DarrylG
  • 16,732
  • 2
  • 17
  • 23