4

I have an excel sheet with more than 1000 columns and 300 rows. Some these cells have normal data, while some cells have a background color of red and some cells have normal white background but the text is red in color. For example, my excel sheet looks like below:

enter image description here

I am reading this excel sheet into Python (pandas) to use it as a dataframe and perform further actions on it. However, the red text and red cells need to be treated differently than the normal cells.

Therefore, I would like to split the above table into 3 tables, such that: Table one has all the cells but the red background cells are empty. Table 2 has only those rows and columns where the text is red. Table 3 has only those rows and columns where the background is red.

I guess it cannot be done in Pandas. I tried using StyleFrame but failed.

Can anyone help in this regard? Is there any python package that is helpful in this case?

Community
  • 1
  • 1
Hanif
  • 377
  • 4
  • 19
  • 1
    You can use `StyleFrame.read_excel` with argument `use_openpyxl_styles=False` to be able to use pandas style indexing based on cells' style, as suggested in the docs: https://styleframe.readthedocs.io/en/latest/api_documentation.html#read-excel – DeepSpace Sep 04 '18 at 09:07
  • I did that. Now I have the complete dataframe read into a variable. Now the question is how do i filter out the cells based on their formatting. – Hanif Sep 04 '18 at 09:13

1 Answers1

4

This is pretty much the way to achieve this. It is not pretty as StyleFrame wasn't really designed to be used this way.

Reading the source Excel file

import numpy as np
from StyleFrame import StyleFrame, utils

sf = StyleFrame.read_excel('test.xlsx', read_style=True, use_openpyxl_styles=False)

1) All cells but cells with red background are empty

def empty_red_background_cells(cell):
    if cell.style.bg_color in {utils.colors.red, 'FFFF0000'}:
        cell.value = np.nan
    return cell

sf_1 = StyleFrame(sf.applymap(empty_red_background_cells))    
print(sf_1)
#      C1       C2 C3    C4      C5      C6
# 0    a1      1.0  s   nan  1001.0  1234.0
# 1    a2     12.0  s   nan  1001.0  4322.0
# 2    a3      nan  s   nan  1001.0  4432.0
# 3    a4    232.0  s   nan  1001.0  4432.0
# 4    a5    343.0  s  99.0     nan     nan
# 5    a6      3.0  s  99.0  1001.0  4432.0
# 6    a7     34.0  s  99.0  1001.0  4432.0
# 7    a8      5.0  s   nan  1001.0  4432.0
# 8    a9      6.0  s  99.0  1001.0  4432.0
# 9   a10    565.0  s  99.0     nan  4432.0
# 10  a11   5543.0  s  99.0  1001.0  4432.0
# 11  a12    112.0  s  99.0  1001.0     nan
# 12  a13  34345.0  s  99.0  1001.0  4432.0
# 13  a14      0.0  s  99.0     nan     nan
# 14  a15    453.0  s  99.0  1001.0     nan

2) Only cells with red text

def only_cells_with_red_text(cell):
    return cell if cell.style.font_color in {utils.colors.red, 'FFFF0000'} else np.nan

sf_2 = StyleFrame(sf.applymap(only_cells_with_red_text).dropna(axis=(0, 1), how='all'))
# passing a tuple to pandas.dropna is deprecated since pandas 0.23.0, but this can be
# avoided by simply calling dropna twice, once with axis=0 and once with axis=1

print(sf_2)
#         C2      C6
# 7     nan   4432.0
# 8     nan   4432.0
# 9    565.0     nan
# 10  5543.0     nan
# 11   112.0     nan

3) Only cells with red background

def only_cells_with_red_background(cell):
    return cell if cell.style.bg_color in {utils.colors.red, 'FFFF0000'} else np.nan

sf_3 = StyleFrame(sf.applymap(only_cells_with_red_background).dropna(axis=(0, 1), how='all'))
# passing a tuple to pandas.dropna is deprecated since pandas 0.23.0, but this can be
# avoided by simply calling dropna twice, once with axis=0 and once with axis=1

print(sf_3)
#        C4      C6
# 0    99.0     nan
# 1    99.0     nan
# 2    99.0     nan
# 3    99.0     nan
# 13    nan  4432.0
# 14    nan  4432.0
DeepSpace
  • 78,697
  • 11
  • 109
  • 154
  • There was one error though, in sf_3, as there I get an empty dataframe with the columns names as you have specified. I managed to solve this issue by first extracting sf_2 and sf_3 and in the last sf_1. Don't know why it happens as we are using different dataframe names to copy the data to. – Hanif Sep 04 '18 at 11:25
  • You made some changes to the code due to which it does not work anymore properly. The issue occurs in (1). The code/function empties the next row and not the one that has red text. I am confirming it after running the code on a number of file with 1000x2000 columns/rows and performing the QC. Can you suggest any trick here? – Hanif Sep 06 '18 at 16:11
  • @Hanif (1) empties cells with red background, not red text – DeepSpace Sep 06 '18 at 17:11
  • Sorry I mean to say red background. I tried editing the function to see if it works on red text cells by replacing cell.style.bg_color with cell.style.font_color and get the same problem. The function somehow selects the next immediate row after the row that has either red background or red text. I tried an earlier version of your function in this post and it worked perfectly on a smaller table but I lost that code due some technical reason. Is it possible to state your initial code function somewhere here, maybe that helps? or if you have any better suggestion? – Hanif Sep 06 '18 at 18:51
  • 1
    @Hanif You can see the edit history of my answer [here](https://stackoverflow.com/posts/52164554/revisions), but I never edited (1), and I don't see why it would empty the row after, especially after you verified it was working as expected on a smaller table / different file. I suspect the difference is in the files and not the code. If you still face the same issue I think it would be better for us to continue this discussion on Github as a [Github issue](https://github.com/DeepSpace2/StyleFrame/issues/new) – DeepSpace Sep 06 '18 at 19:08
  • 1
    Thanks for the prompt feedback. I will check the code once again tomorrow and if the issue persists then will continue the discussion on Git. – Hanif Sep 06 '18 at 19:15
  • @DeepSpace I am trying to delete the values of a column having a red background. And I am getting an error saying `AttributeError: 'Styler' object has no attribute 'dropna'` can you please help me with where am I wrong. – Twinkle Lahariya Jan 19 '20 at 04:36
  • @TwinkleLahariya please ask a new question with the exact code you are using and the error you recieve – DeepSpace Jan 19 '20 at 11:27