0

I am hosting a flask application on EC2. The application is written based on and with libraries for Python 3.9, but the Red Hat EC2 instance is capped at Python 3.6. The EC2 is on a private network inside the organisation.

In my code, I used Pandas to open an openpyxl sheet and process it into a csv using the following code:

with open(filename, 'rb') as f:
                    excel = msoffcrypto.OfficeFile(f)
                    excel.load_key(password=app.config['PASS']) 
                    excel.decrypt(temp) #removes password off excel
                    nw = openpyxl.load_workbook(temp)
                    sheetNames = nw.sheetnames
                    selectedsheet = sheetNames[int(app.config['SHEET'])] #grabs the required sheet
                    print('route1:pass1')
                    data = pd.read_excel(   # <-- This defaults to xlrd which cannot read xlsm files
                        temp,
                        sheet_name=int(reqs.sheet),
                        header=int(reqs.header)
                    ) 
                    print('route1:pass2')
                    df = pd.DataFrame(data)
                    print('route1:pass3')
                    e_name = reqs.warehouse.replace(
                        ' ', "_") + "_" + selectedsheet.replace(
                        ' ', "_") + "_" + str(currentMonth) + "_" + str(currentYear) + ".csv"
                    print(e_name)
                    file.save(os.path.join(app.config['UPLOAD_FOLDER'], e_name))
                    print('route1:pass4')
                    csvname = 'uploads/' + e_name
                    if reqs.final_name != '':
                        os.remove(reqs.final_name)
                    reqs.final_name = csvname
                    df.to_csv(csvname, encoding='utf-8', index=False)
                    reqs.e_name = e_name

This works fine on my localhost using 3.9, but when running on 3.6, it causes issues.

I'm using Pandas version 1.1.5 which is the highest that Python 3.6 supports. This version of Pandas uses xlrd as the default engine for reading Excel files. However, using the most updated version of xlrd which is 2.0.1 only supports xls files while my goal is to process xlsm files.

This version of Pandas also doesn't allow me to use an alternative engine, I tried adding the engine='openpyxl' to my pd.read_excel() and it still defaulted to xlrd to read the file.

I can downgrade xlrd to 1.2.0, but for security reasons, this seems like a bad idea. I don't have enough time to write this all from scratch, so i'm not sure what I should do.

Have I missed something? What are my options? Am I wrong in the outline of what the issue is above?

My only option is to downgrade the xlrd or write this from scratch on 3.6, but I have a deadline, so I don't think I have the time.

davidism
  • 121,510
  • 29
  • 395
  • 339
Dzartz94
  • 11
  • 3
  • 1. Can wrap everything in docker. 2. Can use anaconda for python3.9+ in virtual environment. 3. Use pyenv for version management. – Pranjal Doshi Jan 18 '23 at 15:11
  • What do you mean "capped at Python 3.6" can you not install Python 3.9? An OS might come with certain versions of Python installed that doesn't mean you can't install other versions. – Abdul Aziz Barkat Jan 18 '23 at 15:15
  • @AbdulAzizBarkat so the yum list for redshift only has python version 3.6.8. We can install newer versions but it would have to be done manually. It is not a time-restrictive solution, unfortunately. – Dzartz94 Jan 18 '23 at 15:23

0 Answers0