Pandas ExcelFile read columns as string

Question

I know that you can specify data types when reading excels using pd.read_excel (as outlined here). Can you do the same using pd.ExcelFile?

I have the following code:

 if ".xls" in 
     xl = pd.ExcelFile(path + "\\" + name, )
     for sheet in xl.sheet_names:
         xl_parsed = xl.parse(sheet)

When parsing the sheet, some of the values in the columns are displayed in scientific notation. I don't know the column names before loading so I need to import everything as string. Ideally I would like to be able to do something like xl_parsed = xl.parse(sheet, dtype = str). Any suggestions?

Please make sure that you refer to [the documentation](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.ExcelFile.parse.html) before asking. The accepted answer in the question you linked shows that `read_excel` takes a `converters` parameter. So does `pd.ExcelFile.parse`. Did you try it? — roganjosh, Mar 19 '18 at 22:04
Although I should have read the end of your question more closely as that deals with something not covered by the docs. There was an issue about it [here](https://github.com/pandas-dev/pandas/issues/5891) which got closed but doesn't seem to have been resolved in any proper way. There is one approach in there that you might be able to adapt; open the Excel in xlrd, inspect the headers to get the names, and create a dictionary comprehension to then pass as `converters`. Kinda ugly. — roganjosh, Mar 19 '18 at 22:20
@roganjosh thanks for the input, that's what i did! I'll post it as an answer — michal111, Mar 27 '18 at 10:28
The solution you suggested is supported by the `parse` method although it is not mentioned explicitly on the documentation, so you can just use `xl_parsed = xl.parse(sheet, dtype=str)` — Luiscri, Nov 29 '21 at 11:43

score 3 · Answer 1 · answered Jan 15 '20 at 18:31

If you would prefer a cleaner solution, I used the following:

excel = pd.ExcelFile(path)
for sheet in excel.sheet_names:
    columns = excel.parse(sheet).columns
    converters = {column: str for column in columns}

    data = excel.parse(sheet, converters=converters)

score 1 · Answer 2 · edited Nov 09 '21 at 18:42

1

I went with roganjosh's suggestion - open the excel first, get column names and then pass as converter.

if ".xls" in name:
    xl = pd.ExcelFile(path)
    sheetCounter = 1
        for sheet in xl.sheet_names:
        ### Force to read as string ###
        column_list = []
        df_column = pd.read_excel(path, sheetCounter - 1).columns
        for i in df_column:
            column_list.append(i)
        converter = {col: str for col in column_list}
        ##################
        xl_parsed = xl.parse(sheet, converters=converter)
        sheetCounter = sheetCounter + 1

edited Nov 09 '21 at 18:42

ggorlen

44,755
7
76
106

answered Mar 27 '18 at 10:31

michal111

400
4
18

Thanks for the answer. `column_list` seems pointless, you can just use `df_column` directly. You can also use `for i, sheet in enumerate(xl.sheet_names):` instead of `sheetCounter`. I assume the `for sheet in ...` should be indented -4 spaces. – ggorlen Nov 09 '21 at 18:49

Pandas ExcelFile read columns as string

2 Answers2