1

I've tried a few methods, including pandas:

df = pd.read_excel('file.xlsx')
df.to_csv('file.csv')

But every time I convert my xlsx file over to csv format, I lose all data within columns that include a formula. I have a formula that concatenates values from two other cells + '@domain' to create user emails, but this entire column returns blank in the csv product. The formula is basically this:

=CONCATENATE(B2,".",E2,"@domain")

The conversion is part of a larger code workflow, but it won't work if this column is left blank. The only thing I've tried that worked was this API, but I'd rather not pay a subscription if this can be done locally on the machine.

Any ideas? I'll try whatever you throw at me - bear in mind I'm new to this, but I will do my best!

Alex Waygood
  • 6,304
  • 3
  • 24
  • 46
TheCrazySwede
  • 113
  • 2
  • 10

1 Answers1

1

You can try to open the excel file with the openpyxl library in the data-only mode. This will prevent the raw formulas - they are going to be calculated just the way you see them in excel itself.

import openpyxl 

wb = openpyxl.load_workbook(filename, data_only=True)

Watch out when youre working with you original file and save it with the openpyxl-lib in the data-only-mode all your formulas will be lost. I had this once and it was horrible. So i recommend using a copy of your file to work with.

Since you have your xlsx-file with values only you can now use the internal csv library to generate a proper csv-file (idea from this post: How to save an Excel worksheet as CSV):

import csv

sheet = wb.active # was .get_active_sheet()
with open('test.csv', 'w', newline="") as f:
    c = csv.writer(f)
    for r in sheet.iter_rows(): # generator; was sh.rows
        c.writerow([cell.value for cell in r])
MegaMinx
  • 109
  • 7
  • I tried that, but it still didn't work - the CSV file still dropped the rows in the column that had the formula in it. I appreciate the suggestion, though! – TheCrazySwede Sep 23 '21 at 15:15