Read .xlsx file from s3 bucket and convert each tab to csv with tab name and upload to bucket

Question

import pandas as pd 
import xlrd 
import openpyxl 
from io import StringIO  
import boto3 

def lambda_handler(event,context):  
    df=pd.read_excel('s3://my-bucket/XL/test-xls.xlsx', engine='openpyxl')    
    bucket = 'my-bucket'   
    csv_buffer = StringIO() 
    df.to_csv(csv_buffer) 
    s3_resource = boto3.resource('s3') 

    # write the data back as a CSV 
    s3_resource.Object(bucket,'XL/test-csv.csv').put(Body=csv_buffer.getvalue())

Above code is working fine for one excel but I am searching for solution where I can read XLSX file
If XLSX file has 3 tab then those 3 tabs should get converted into 3 different CSV and save file into bucket with tabname.csv

For reading multiple sheets from the same workbook (2), see if [this post](https://stackoverflow.com/questions/26521266/using-pandas-to-pd-read-excel-for-multiple-worksheets-of-the-same-workbook) helps — Jeremy, Apr 23 '22 at 17:48

score 0 · Answer 1 · answered Apr 23 '22 at 17:40

0

You can try xlsx2csv instead of pandas. xlsx2csv with -n option might work.

Also xlsx2csv have more options for sheet tabs, you can choose what suits you.

answered Apr 23 '22 at 17:40

sauraj

25
8

Dennis Meissel · Answer 2 · 2023-06-16T15:07:17.220

To read a particular sheet you just need to pass it as a parameter to the read_excel function:

excel_object = s3.get_object(Bucket=source_bucket, Key=source_key)['Body'].read()
df_sheet1 = pd.read_excel(excel_object, sheet_name='name_of_sheet1')
df_sheet2 = pd.read_excel(excel_object, sheet_name='name_of_sheet2')

Note: You can also use the 0-based index of the sheet

If you want to extract every sheet, you can get the sheet names using the .sheet_names property

excel = pd.ExcelFile(s3_client.get_object(Bucket=source_bucket, Key=source_key)['Body'].read())
names = excel.sheet_names
for name in names:
  sheet = pd.read_excel(excel, sheet_name=name)

and extract every sheet in a loop.

pandas.read_excel

Read .xlsx file from s3 bucket and convert each tab to csv with tab name and upload to bucket

2 Answers2