How to reduce the time taken to read a xlsx and convert it to a csv in pandas on a large dataset?

Asked Apr 09 '18 at 11:02

Active Apr 09 '18 at 11:21

Viewed 319 times

I am using a dataset of 60,000. Which is taking 6.4 seconds to read the xlsx file and then convert it into a CSV. How to reduce the time? My code :

import pandas as pd
import time


def read_xlsx(path):
    df = pd.read_excel(path)
    return df


def convert_to_csv(df):
    df.to_csv('orders_csv_file.csv')




if __name__ == '__main__':
    start = time.clock()
    df = read_xlsx("/home/arima/sublime_workspace/orders.xlsx")
    print(time.clock() - start)

    start = time.clock()
    convert_to_csv(df)
    print(time.clock() - start)

Time taken for reading the excel is high(6 sec), converting it into csv taking(.30) sec.

edited Apr 09 '18 at 11:21

asked Apr 09 '18 at 11:02

Sidhartha

Tough question. Both reading & writing excel files is slow. That's for a reason. .xlsx files are compressed and require decoding. I'm not sure you're going to find a faster solution. – jpp Apr 09 '18 at 11:18
reading the excel taking itself taking more time, I have updated the qs – Sidhartha Apr 09 '18 at 11:22
Is Python a requirement? I think it's unlikely you'll find a faster solution. – jpp Apr 09 '18 at 11:27
I need to speed up the process if it possible with python or pandas – Sidhartha Apr 09 '18 at 11:31
In that case, this is a duplicate. If you are not happy with the answer provided in the dup, consider offering a bounty. But I don't think you will find better answers. – jpp Apr 09 '18 at 11:34

How to reduce the time taken to read a xlsx and convert it to a csv in pandas on a large dataset?

0 Answers0