I second Romain's comments that you should limit the amount of times the file is read, and iteration is bad if you can use set operations.
In my opinion you can set the row colour in an operation and you don't need to drop down into xlwings to do it.
I will set out some examples below to explain different approaches:
Option 1 - Iteration
import numpy as np
import pandas as pd
# Set up the requirements for the row to be coloured
# this will make more sense later
def color(row):
if row["check"] == "matched":
return ['background-color: red'] * len(row)
return [''] * len(row)
# Note this are raw strings to handle the Windows backslash path character
values_to_check = [r'C:\folder\somepath\1234_456_2.pdf', r'C:\folder\somepath\whatever\5932194_123.pdf']
df = pd.read_excel('data.xlsx', sheet_name='My Data')
# Add a blank column as a placeholder
df["check"] = ""
for i in range(len(df)):
# this tests if any of the entries in the file list match the current record
if any(df.loc[i, "value"] in x for x in values_to_check):
df.loc[i, "check"] = "matched"
else:
df.loc[i, "check"] = "not matched"
# now we can apply the colour option
# associate a styler object with the dataframe
styler = df.style
# apply the colour function to select and change the rows
styler.apply(color, axis=1)
# use ExcelWriter rather than using to_Excel directly in order to give access to the append & replace functions
with pd.ExcelWriter("data.xlsx", engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
styler.to_excel(writer, 'My Data', index=False)
This gives an output with an additional column flagging whether or not it is matched.

Option 2 - Set operations (Pandas merge)
import numpy as np
import pandas as pd
import pathlib
def color_joined(row):
if row["_merge"] == "both":
return ['background-color: red'] * len(row)
return [''] * len(row)
def clean_inputs(input_item:str) -> str:
# Using PureWindowsPath vs Path to handle the backslashes
# stem returns the filename only, no path or extension
# get rid of the underscores to apply int comparisons based on your comments
return int(pathlib.PureWindowsPath(input_item).stem.replace('_',''))
values_to_check = [r'C:\folder\somepath\1234_456_2.pdf', r'C:\folder\somepath\whatever\5932194_123.pdf']
# Let's have only the filenames, and without the underscores, as int
# you may need to fiddle with this a bit to match your real-world data
cleaned_filenames = [ clean_inputs(x) for x in values_to_check ]
# No need to invent a blank check column here
df = pd.read_excel('data.xlsx', sheet_name='My Data')
# Instead, convert the value list into a dataframe too
lookup_list = pd.DataFrame(cleaned_filenames, columns=['value'])
# this uses a left join and leaves a flag
joined_df = df.merge(lookup_list, on='value', how='left', indicator=True)
# the result is a df with all of the records, plus a column called "_merge"
# the values of this column will be either "left_only" for no match or "both" for a match
styler = joined_df.style
styler.apply(color_joined, axis=1)
# Drop the _merge column by writing out only the specified columns
with pd.ExcelWriter("output.xlsx", engine="openpyxl", mode="a", if_sheet_exists="replace") as writer:
styler.to_excel(writer, 'Merged', index=False, columns=['title', 'description', 'value', 'extra_column'])
This gives the same output as above. In theory it should be more optimised than simply using multiple loops, but, as always, you should test performance on your specific data.
Notes:
If you wanted a list of only the matches, using option 2 this could be done by s1 = pd.merge(df, lookup_list, how='inner', on=['value'])
.
In theory, you should be able to drop the columns before writing to Excel by using styler.hide(subset=['_merge', 'check'], axis="columns")
; however, I couldn't get this to work in my test. See styler.hide documentation for details.
You can save memory (and speed up processing) by specifying the data type of the columns (e.g. int vs dtype) as the default is to use a dtype object.