0

I am trying to read columns from excel which is in URL: https://www.ema.europa.eu/sites/default/files/Medicines_output_european_public_assessment_reports.xlsx

I am reading this excel file using python xlrd module. I am reading row one by one row in dict format.(Note: Here, I have modified dict keys in smaller case.)

Example: (single line)

{
    'category': 'Human',
    'medicine name': 'Avastin',
    'therapeutic area': 'Carcinoma, Non-Small-Cell Lung, Breast Neoplasms, Ovarian Neoplasms, Colorectal Neoplasms, Carcinoma, Renal Cell',
    'international non-proprietary name (inn) / common name': 'bevacizumab',
    'active substance': 'bevacizumab',
    'product number': 'EMEA/H/C/000582',
    'patient safety': 'no',
    'authorisation status': 'Authorised',
    'atc code': 'L01XC07',
    'additional monitoring': 'no',
    'generic': 'no',
    'biosimilar': 'no',
    'conditional approval': 'no',
    'exceptional circumstances': 'no',
    'accelerated assessment': 'no',
    'orphan medicine': 'no',
    'marketing authorisation date': 38363.95833333334,
    'date of refusal of marketing authorisation': '',
    'marketing authorisation holder/company name': 'Roche Registration GmbH',
    'human pharmacotherapeutic group': 'Antineoplastic agents, ',
    'vet pharmacotherapeutic group': '',
    'date of opinion': '',
    'decision date': 43580.91666666666,
    'revision number': 51.0,
    'condition / indication': 'Bevacizumab in combination with fluoropyrimidine-based chemotherapy is indicated for treatment of adult patients with metastatic carcinoma of the colon or rectum.Bevacizumab in combination with paclitaxel is indicated for first-line treatment of adult patients with metastatic breast cancer. For further information as to human epidermal growth factor receptor 2 (HER2) status.Bevacizumab in combination with capecitabine is indicated for first-line treatment of adult patients with metastatic breast cancer in whom treatment with other chemotherapy options including taxanes or anthracyclines is not considered appropriate. Patients who have received taxane and anthracycline-containing regimens in the adjuvant setting within the last 12 months should be excluded from treatment with Avastin in combination with capecitabine. For further information as to HER2 status.Bevacizumab, in addition to platinum-based chemotherapy, is indicated for first-line treatment of adult patients with unresectable advanced, metastatic or recurrent non-small cell lung cancer other than predominantly squamous cell histology.Bevacizumab, in combination with erlotinib, is indicated for first-line treatment of adult patients with unresectable advanced, metastatic or recurrent non-squamous non-small cell lung cancer with Epidermal Growth Factor Receptor (EGFR) activating mutations.Bevacizumab in combination with interferon alfa-2a is indicated for first line treatment of adult patients with advanced and/or metastatic renal cell cancer.Bevacizumab, in combination with carboplatin and paclitaxel is indicated for the front-line treatment of adult patients with advanced (International Federation of Gynecology and Obstetrics (FIGO) stages III B, III C and IV) epithelial ovarian, fallopian tube, or primary peritoneal cancer.Bevacizumab, in combination with carboplatin and gemcitabine, is indicated for treatment of adult patients with first recurrence of platinum-sensitive epithelial ovarian, fallopian tube or primary peritoneal cancer who have not received prior therapy with bevacizumab or other VEGF inhibitors or VEGF receptor–targeted agents.Bevacizumab in combination with paclitaxel, topotecan, or pegylated liposomal doxorubicin is indicated for the treatment of adult patients with platinum-resistant recurrent epithelial ovarian, fallopian tube, or primary peritoneal cancer who received no more than two prior chemotherapy regimens and who have not received prior therapy with bevacizumab or other VEGF inhibitors or VEGF receptor–targeted agents.Bevacizumab, in combination with paclitaxel and cisplatin or, alternatively, paclitaxel and topotecan in patients who cannot receive platinum therapy, is indicated for the treatment of adult patients with persistent, recurrent, or metastatic carcinoma of the cervix.',
    'species': '',
    'atcvet code': '',
    'first published': 43321.43680555555,
    'revision date': 43704.375,
    'url': 'https://www.ema.europa.eu/en/medicines/human/EPAR/avastin'
}

The problem is, in excel I am getting Revision Date column as 8/27/2019 9:00:00 AM But while reading excel using xlrd, it is get converted as 43704.375 as its column type is date.

How can I read/get date and time in correct format using python?

For a brief, (43704.375 is not even a timestamp for that date.), how can I convert 8/27/2019 9:00:00 AM to 43704.375? and vice versa?

Harsha Biyani
  • 7,049
  • 9
  • 37
  • 61
  • 1
    The format is simple enough: it's the number of days since Jan 1, 1900. 0.375 days is exactly 9 hours, which accounts for the time of 9:00:00 am. The conversion is a little tricky, since most years have 365, but some have 366, so it's not as simple as just computing `divmod(43704, 365)` – chepner Aug 30 '19 at 13:15

1 Answers1

1

Excel internally stored date values as floats. So in xlrd if you want to read Excel date values as Python date values, you have to use the xldate_as_tuple method to get the date.

Documentation: http://www.lexicon.net/sjmachin/xlrd.html#xlrd.xldate_as_tuple-function

Here's a generic Example:

import datetime, xlrd
book = xlrd.open_workbook("myexcelfile.xls")
sh = book.sheet_by_index(0)
a1 = sh.cell_value(rowx=0, colx=0)
a1_as_datetime = datetime.datetime(*xlrd.xldate_as_tuple(a1, book.datemode))
print 'datetime: %s' % a1_as_datetime

If you create the file myexcelfile.xls and enter a date in cell A1 and run the above code, you should be able to see the correct datetime value in the a1_as_datetime variable.

michael
  • 315
  • 2
  • 9