1

I have an excel file with many rows/columns and when I convert the file directly from .xlsx to .txt with excel, the file ends up with a weird indentation (the columns are not perfectly aligned like in an excel file) and due to some requirements, I really need them to be.

So, is there a better way to write from excel to txt using python? or format the txt file so the columns perfectly align?

I found this code in a previous question but I am getting the following error:

TypeError: a bytes-like object is required, not 'str'

Code:

import xlrd
import csv

# open the output csv
with open('my.csv', 'wb') as myCsvfile:
    # define a writer
    wr = csv.writer(myCsvfile, delimiter="\t")

    # open the xlsx file 
    myfile = xlrd.open_workbook('myfile.xlsx')
    # get a sheet
    mysheet = myfile.sheet_by_index(0)

    # write the rows
    for rownum in range(mysheet.nrows):
        wr.writerow(mysheet.row_values(rownum))
Roshan Br
  • 362
  • 2
  • 17

3 Answers3

2

is there a better way to write from excel to txt using python?

I'm not sure if it's a better way, but you could write the contents of xlsx file to txt this way:

import pandas as pd

with open('test.txt', 'w') as file:
    pd.read_excel('test.xlsx').to_string(file, index=False)

Edit:

to convert date column to a desired format, you could try the following:

with open('test.txt', 'w') as file:
    df = pd.read_excel('test.xlsx')
    df['date'] = pd.to_datetime(df['date']).dt.strftime('%Y%m%d')
    df.to_string(file, index=False, na_rep='')
help-ukraine-now
  • 3,850
  • 4
  • 19
  • 36
  • yess! the notepad shows the correct columns, thank you a lot! the date columns are showing full format like 2018-11-06 00:00:00 and also the empty rows show NaN. Do you know a way to stop it from doing that? (leaving dates as 20181106 as it was and NaN as blank spaces) –  Aug 26 '19 at 08:31
  • @BrunnaT. to leave NaNs as blank spaces, add `na_rep=''` argument to a `.to_string`, so that it becomes `to_string(file, index=False, na_rep='')` – help-ukraine-now Aug 26 '19 at 08:34
  • @BrunnaT. as for the date column showing full format, I'm not sure, since in the example data I have a column `date` with a value like `20181106` is still `20181106` after saving to `txt` – help-ukraine-now Aug 26 '19 at 08:37
  • @BrunnaT. see the answer, I tried to fix the date column – help-ukraine-now Aug 26 '19 at 08:45
0

The problem lies in this row:

with open('my.csv', 'wb') as myCsvfile:

'wb' suggests you will be writing bytes, but in reality, you will be writing regular characters. Change it to 'w'. Perhaps the best practice would be to also use with block for Excel file:

import xlrd
import csv

# open the output csv
with open('my.csv', 'w') as myCsvfile:
    # define a writer
    wr = csv.writer(myCsvfile, delimiter="\t")

    # open the xlsx file 
    with xlrd.open_workbook('myfile.xlsx') as myXlsxfile:
        # get a sheet
        mysheet = myXlsxfile.sheet_by_index(0)
        # write the rows
        for rownum in range(mysheet.nrows):
            wr.writerow(mysheet.row_values(rownum))
Metin Atalay
  • 1,375
  • 18
  • 28
  • thanks for the answer! the output is still not indented correctly like columns however, the dates are being converted to excel enconding like 43400 and the it's adding .0 to all the numbers haha I am not sure how to make this txt file actually correctly indented –  Aug 26 '19 at 08:23
  • You can try using method shown here: [link](https://stackoverflow.com/questions/13962837/reading-date-as-a-string-not-float-from-excel-using-python-xlrd). However then you will have to iterate over columns as well as the rows. – Aljaž Medič Aug 26 '19 at 08:39
0
import pandas as pd

read_file = pd.read_excel (r'your excel file name.xlsx', sheet_name='your sheet name')
read_file.to_csv (r'Path to store the txt file\File name.txt', index = None, header=True)
pzaenger
  • 11,381
  • 3
  • 45
  • 46