1
import pandas as pd 
import xlrd 
import openpyxl 
from io import StringIO  
import boto3 

def lambda_handler(event,context):  
    df=pd.read_excel('s3://my-bucket/XL/test-xls.xlsx', engine='openpyxl')    
    bucket = 'my-bucket'   
    csv_buffer = StringIO() 
    df.to_csv(csv_buffer) 
    s3_resource = boto3.resource('s3') 

    # write the data back as a CSV 
    s3_resource.Object(bucket,'XL/test-csv.csv').put(Body=csv_buffer.getvalue()) 
  1. Above code is working fine for one excel but I am searching for solution where I can read XLSX file
  2. If XLSX file has 3 tab then those 3 tabs should get converted into 3 different CSV and save file into bucket with tabname.csv
Jeremy
  • 661
  • 7
  • 19
Snownew
  • 33
  • 7
  • 1
    For reading multiple sheets from the same workbook (2), see if [this post](https://stackoverflow.com/questions/26521266/using-pandas-to-pd-read-excel-for-multiple-worksheets-of-the-same-workbook) helps – Jeremy Apr 23 '22 at 17:48

2 Answers2

0

You can try xlsx2csv instead of pandas. xlsx2csv with -n option might work.

Also xlsx2csv have more options for sheet tabs, you can choose what suits you.

sauraj
  • 25
  • 8
0

To read a particular sheet you just need to pass it as a parameter to the read_excel function:

excel_object = s3.get_object(Bucket=source_bucket, Key=source_key)['Body'].read()
df_sheet1 = pd.read_excel(excel_object, sheet_name='name_of_sheet1')
df_sheet2 = pd.read_excel(excel_object, sheet_name='name_of_sheet2')

Note: You can also use the 0-based index of the sheet

If you want to extract every sheet, you can get the sheet names using the .sheet_names property

excel = pd.ExcelFile(s3_client.get_object(Bucket=source_bucket, Key=source_key)['Body'].read())
names = excel.sheet_names
for name in names:
  sheet = pd.read_excel(excel, sheet_name=name)

and extract every sheet in a loop.

pandas.read_excel

Dennis Meissel
  • 1,825
  • 1
  • 21
  • 33