0

I'm trying to color cells red within specified columns of a large XLSX file depending on if the Value in column U is "N".

I'm able to achieve this with my code below but it's taking over 20 minutes, would anyone have any suggestions on how I can improve performance to make this run faster?

import pandas as pd
import time

startTime = time.time()

filePath = r"C:\\Users\\Desktop\\Match\\"
parameters = 'large_file.xlsx'

df = pd.read_excel(filePath + parameters)

cusip_match = 'N'

#color columns U, Units & units.1 based on if there is a "N" Cell in column U.
colorMatch = df.style\
    .apply(lambda x: ['background-color: red' if x == cusip_match else x for x in df.U],subset=['U'])\
    .apply(lambda x: ['background-color: red' if x == cusip_match else x for x in df.U],subset=['Units'])\
    .apply(lambda x: ['background-color: red' if x == cusip_match else x for x in df.U],subset=['Units.1'])

#output File
colorMatch.to_excel(r"C:\\Users\\Desktop\\Match\\Large_Compare_Matches.xlsx")

executionTime = (time.time() - startTime)
print('Execution time in seconds: ' + str(executionTime))
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Nantourakis
  • 107
  • 1
  • 8
  • Your lambda function is called for every row and loops through all values of the column. You can rewrite this without the loops it will be faster – SiP Mar 15 '22 at 13:00
  • A sample input DataFrame and expected output is always helpful to establish what exactly the goal is. It's also important to note that repr on large DataFrames is one of the known [limitations](https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html#Limitations) so it's possible that the sheet is just too big. read_excel and to_excel are also known to be slow functions (_e.g._ [Faster way to read Excel files to pandas dataframe](https://stackoverflow.com/q/28766133/15497888)) – Henry Ecker Mar 20 '22 at 00:59

1 Answers1

0
DataFrame(df['U']).style.apply(lambda x: 'background-color: red' if x == cusip_match else x)
SiP
  • 1,080
  • 3
  • 8
  • Thanks for helping me look into this, I get the following error when giving this a try: AttributeError: 'Series' object has no attribute 'style' – Nantourakis Mar 16 '22 at 19:51
  • I assumed since the df has style the series also would. You could try recasting the series to df maybe? – SiP Mar 17 '22 at 10:36
  • @Nantourakis did it work for you? – SiP Mar 19 '22 at 17:36