5

I have a .xlsb file that I want to use pandas and analyse. I have found how to use pyxlsb to open the file and create another dataset. However, the problem now is that the time formats have changed into a different number format (e.g 41256).

The code I'm using at the moment is:

dataset = []

with open_xlsb(file) as wb: #opening an xlsb file workbook
    with wb.get_sheet(1) as sheet1:
        for row in sheet1.rows():
            dataset.append([item.v for item in row])

dataset= pd.DataFrame(dataset[1:], columns=dataset[0])

I have already tried the convert_date as follows:

convert_date(dataset)

I have also tried the to_datetime function, but unsure if I used it correctly. For reference, the dataset I am using has dates in multiple columns and rows so I'm looking for a way to covert all of them into the right format, whilst ignoring any errors.

EDIT: So I don't have a single column with "Date", rather I have multiple columns, such as StartDate, EndDate, Last Updated and a few others. The result I want to see is if I go to a column, e.g dataset.columns['StartDate'], I want to get a date value, such as 15/03/2019, as opposed to 42156.

Any help would be much appreciated!

JollyKinG
  • 97
  • 1
  • 5
  • Are you looking for a `pandas` specific solution? I'm sure you could accomplish this through the `datetime` module if not – Reedinationer Mar 27 '19 at 23:17
  • What is the expected output here? Your title and the question contents don't quite match up. – roganjosh Mar 27 '19 at 23:25
  • 1
    Possible duplicate of [Number to Date Conversion using Pandas in Python?](https://stackoverflow.com/questions/46154788/number-to-date-conversion-using-pandas-in-python) – roganjosh Mar 27 '19 at 23:42
  • That's a _speculative_ dupe because I'm not sure the exact date formats translate between the libraries (turning a numerical input to a date format) – roganjosh Mar 27 '19 at 23:42
  • Your title says "MM/DD/YYY". You don't really want a 3-digit year, do you? (If your requirements permit it, consider using ISO-8601 format "YYYY-MM-DD". https://xkcd.com/1179/) – Keith Thompson Mar 27 '19 at 23:51
  • Sorry, it is meant to be MM/DD/YYYY (typo) – JollyKinG Mar 28 '19 at 00:01

2 Answers2

4

I had a similar issue which I was able to overcome due to this answer.

Try the function datetime.fromordinal from the datetime module as follows:

from datetime import datetime


# for integers
def convert_int_date(ordinal_date):
    date_time = datetime.fromordinal(
        datetime(1900, 1, 1).toordinal() + ordinal_date - 2
    )
    return date_time

dataset["StartDate"] = dataset["StartDate"].astype(int).apply(convert_int_date)

You can then format your date as you please.

dataset["StartDate"] = dataset["StartDate"].dt.strftime("%m/%d/%Y")

In my particular case, every date was given by an integer. Hours, minutes and seconds were not taken into account. If you do need to take those into account (or if you have a floating number as a date) try the following:

# for floats
def convert_float_date(ordinal_date):
    hours, r = divmod(ordinal_date, 1)
    minutes, r = divmod(r*60, 1)
    seconds = r*60

    date_time = datetime.fromordinal(
        datetime(1900, 1, 1).toordinal() + ordinal_date - 2
    )
    datetime = date_time.replace(hour=hours, minute=minutes, second=seconds)
    return date_time

dataset["StartDate"] = dataset["StartDate"].astype(float).apply(convert_float_date)

Credits

0

Supposing your date column is 'Date', the command would be something like this:

dataset['Date'] = pd.to_datetime(dataset['Date'], format='%m/%d/%Y')
accdias
  • 5,160
  • 3
  • 19
  • 31
  • That only does half the job. Now you have a datetime object, not a formatted string – roganjosh Mar 27 '19 at 23:21
  • Well actually, the question is unclear. The title reads as though they want a string at the end from `strftime` basically – roganjosh Mar 27 '19 at 23:23
  • I didn't see the OP saying he wants a string. He says he wants dates formatted as MM/DD/YYYY. – accdias Mar 27 '19 at 23:23
  • It's implied in the title but I've asked for clarification. – roganjosh Mar 27 '19 at 23:25
  • To your edit: and the output of your answer would not be such an output. The `format` argument to `to_datetime` is to tell pandas how to parse it _out_ of a string and to a datetime object. It won't print or write back out in the specified format if you leave it with just this step. – roganjosh Mar 27 '19 at 23:27
  • No. It will format the ```datetime``` column using ```format```. I do that all the time here. – accdias Mar 27 '19 at 23:29
  • Back at my PC instead of phone so I can see better. I think we've both gone off-course; it looks like [this issue](https://stackoverflow.com/a/46155069/4799172) – roganjosh Mar 27 '19 at 23:43
  • I agree. The workbook is probably using another format and the columns weren't converted to a ```datetime``` object yet. – accdias Mar 27 '19 at 23:45
  • Just for further clarification here, roganjosh is right. As the OP explains, he's getting the ordinal numbers from excel dates (which happens when bringing in xlsb files. Using the ordinal 41256 (which was referenced) in pd.to_datetime() would not produce the desired result. – kowpow Nov 19 '21 at 12:52