2

I have a large excel file with 90k rows, and I want to add only the rows that have red colored text to a dataframe (using styleframe). The code below works if I use a small excel file with 5 rows, but when I attempt to use it with a larger file the data frame is always empty.

even if I remove the dropna I get a styleframe with all Nans, and no reds.

sf = StyleFrame.read_excel('myFile.xlsx', read_style=True, use_openpyxl_styles=False, usecols = ['COLUMN_1'], header = 2)

.

def only_cells_with_red_text(cell):
    return cell if cell.style.font_color in {utils.colors.red, 'FFFF0000'} else np.nan

.

sf_2 = StyleFrame(sf.applymap(only_cells_with_red_text).dropna(axis=(0, 1), how='all'))

I expected only cells with red text to be added to dataframe

The output is Empty DataFrame

Columns: []
Index: []
ap95btg
  • 27
  • 3

1 Answers1

0

It's a bug in StyleFrame. The usecols and header kwargs change the shape of the dataframe (since they cause pd.read_excel to return a subset of the dataframe/sheet). When read_excel then applies the styles it applies the styles to the wrong cells (simply put, it applies the styles based on the location of the cells in the original, entire sheet).

For now, the "workaround" is to remove usecols = ['COLUMN_1'], header=2 (much less efficient, of course) and do the filtering later, ie

sf = sf[['COLUMN_1']]

until I (I'm one of the authors of StyleFrame) find a way to overcome this.

DeepSpace
  • 78,697
  • 11
  • 109
  • 154
  • I'm still getting an empty syleframe. It seems the data is not loading the color. If I iterate through the rows I get that all rows have a '000000' font color. for row in sframe['COLUMN_1']: print(row.value, row.style.font_color) output: FRED 000000 nan 000000 nan 000000 nan 000000 nan 000000 DAYANA 000000 nan 000000 nan 000000 GEORGE 000000 – ap95btg Jun 17 '19 at 19:45