2

I am looking to colour specific cells from two columns that don't match, but would like to use it with python pandas style.where and export in excel using openpyxl.

My code so far:

df = pd.DataFrame({
    'config_dummy1': ["dummytext"] * 100,
    'config_size_x': ["textstring"] * 100,
    'config_size_y': ["textstring"] * 100,
    'config_dummy2': ["dummytext"] * 100
})
df.at[50, 'config_size_x'] = "xandydontmatch"
df.at[99, 'config_size_y'] = "xandydontmatch"
print(df)
df.style.where(
    df['config_size_x'] != df['config_size_y'],
    'color: #ffffff; background-color: #ba3018',
    other=''
).to_excel('styled.xlsx', engine='openpyxl')

I am stuck, as it produces an error:

Traceback (most recent call last):
  File "python-match-csv.py", line 205, in <module>
    main2()
  File "python-match-csv.py", line 131, in main2
    ).to_excel('styled.xlsx', engine='openpyxl')
  File "F:\Python36\lib\site-packages\pandas\io\formats\style.py", line 175, in to_excel
    engine=engine)
  File "F:\Python36\lib\site-packages\pandas\io\formats\excel.py", line 652, in write
    freeze_panes=freeze_panes)
  File "F:\Python36\lib\site-packages\pandas\io\excel.py", line 1390, in write_cells
    for cell in cells:
  File "F:\Python36\lib\site-packages\pandas\io\formats\excel.py", line 617, in get_formatted_cells
    self._format_body()):
  File "F:\Python36\lib\site-packages\pandas\io\formats\excel.py", line 529, in _format_regular_rows
    for cell in self._generate_body(coloffset):
  File "F:\Python36\lib\site-packages\pandas\io\formats\excel.py", line 601, in _generate_body
    styles = self.styler._compute().ctx
  File "F:\Python36\lib\site-packages\pandas\io\formats\style.py", line 514, in _compute
    r = func(self)(*args, **kwargs)
  File "F:\Python36\lib\site-packages\pandas\io\formats\style.py", line 604, in _applymap
    result = self.data.loc[subset].applymap(func)
  File "F:\Python36\lib\site-packages\pandas\core\frame.py", line 6072, in applymap
    return self.apply(infer)
  File "F:\Python36\lib\site-packages\pandas\core\frame.py", line 6014, in apply
    return op.get_result()
  File "F:\Python36\lib\site-packages\pandas\core\apply.py", line 318, in get_result
    return super(FrameRowApply, self).get_result()
  File "F:\Python36\lib\site-packages\pandas\core\apply.py", line 142, in get_result
    return self.apply_standard()
  File "F:\Python36\lib\site-packages\pandas\core\apply.py", line 248, in apply_standard
    self.apply_series_generator()
  File "F:\Python36\lib\site-packages\pandas\core\apply.py", line 277, in apply_series_generator
    results[i] = self.f(v)
  File "F:\Python36\lib\site-packages\pandas\core\frame.py", line 6070, in infer
    return lib.map_infer(x.astype(object).values, func)
  File "pandas/_libs/src\inference.pyx", line 1472, in pandas._libs.lib.map_infer
  File "F:\Python36\lib\site-packages\pandas\io\formats\style.py", line 671, in <lambda>
    return self.applymap(lambda val: value if cond(val) else other,
TypeError: ("'Series' object is not callable", 'occurred at index config_dummy1')

TypeError: ("'Series' object is not callable", 'occurred at index config_dummy1'

I am open to suggestions other than .where(), I also tried to do this with .apply() but failed.

Note: the column index position is not fixed, it could be config_size_x, config_dummy1, config_dummy2, config_size_y or any other combination

Note 2: using windows and python 3.6 if it matters

Savvas Radevic
  • 543
  • 1
  • 8
  • 23
  • `the column index position is not fixed, it could be config_size_x, config_dummy1, config_dummy2, config_size_y or any other combination` - so need compare `config_dummy1` with `config_dummy2` and similar for each pairs? DataFrame always contains columns pairs? – jezrael Jul 04 '18 at 14:17
  • DataFrame contains a lot (about 50-60 columns) of config_COLNAME_y and config_COLNAME_x which are **not** in the same index position every time. config_size_x and config_size_y is one of them, but if I get this right, I can build my way from there. Yes they are always in _x and _y pairs – Savvas Radevic Jul 04 '18 at 14:22

2 Answers2

3

Since this question is tagged with styleframe:

from StyleFrame import StyleFrame, Styler

df = pd.DataFrame({'a': [1, 2], 'b': [1, 3]})

sf = StyleFrame(df)
sf.apply_style_by_indexes(sf[sf['a'] != sf['b']], styler_obj=Styler(bg_color='red'))
sf.to_excel('test.xlsx').save()

Will produce

enter image description here

If you want to color only a subset of the mismatching rows you can simply use cols_to_style param:

sf.apply_style_by_indexes(sf[sf['a'] != sf['b']], styler_obj=Styler(bg_color='red'),
                          cols_to_style=['a', 'b'])

enter image description here

DeepSpace
  • 78,697
  • 11
  • 109
  • 154
  • I'm open to any workaround suggestion :) I forgot to mention that I've also tried styleframe and failed to colour specific cells. :D Unfortunately, your answer colours the whole row. Is there a way to colour the specific cells of config_size_x and config_size_y only? – Savvas Radevic Jul 04 '18 at 14:27
  • 1
    @SavvasRadevic Yes, quite easily. See my updated answer – DeepSpace Jul 04 '18 at 14:30
  • This answer is also valid! Thank you very much! I've had an issue with my index, it was complaining `IndexError: single positional indexer is out-of-bounds` on cols_to_style. I solved it using `df.index = pd.RangeIndex(len(df.index))` before the styleframe. Also it takes about the same time as the other answer (with a foreach loop on the column pairs). – Savvas Radevic Jul 04 '18 at 17:59
2

You can create DataFrame of styles with apply:

def color(x):
    c1 = 'color: #ffffff; background-color: #ba3018'
    m = x['config_size_x'] != x['config_size_y']
    df1 = pd.DataFrame('', index=x.index, columns=x.columns)
    df1.loc[m, ['config_size_x', 'config_size_y']] = c1
    return df1

df.style.apply(color, axis=None)

General solution:

df = pd.DataFrame({
    'config_dummy1': ["dummytext"] * 10,
    'a_y': ["a"] * 10,
    'config_size_x': ["textstring"] * 10,
    'config_size_y': ["textstring"] * 10,
    'config_dummy2': ["dummytext"] * 10,
    'a_x': ["a"] * 10
})
df.at[5, 'config_size_x'] = "xandydontmatch"
df.at[9, 'config_size_y'] = "xandydontmatch"
df.at[0, 'a_x'] = "xandydontmatch"
df.at[3, 'a_y'] = "xandydontmatch"
print(df)

def color(x):
    c1 = 'color: #ffffff; background-color: #ba3018'
    df1 = pd.DataFrame('', index=x.index, columns=x.columns)

    #select only columns ends with _x and _y and sorting
    cols = sorted(x.filter(regex='_x$|_y$').columns)
    #loop by pairs and assign style by mask
    for i, j in zip(cols[::2],cols[1::2]):
        #pairs columns 
        #print (i, j)
        m = x[i] != x[j]
        df1.loc[m, [i, j]] = c1
    return df1

df.style.apply(color, axis=None).to_excel('styled.xlsx', engine='openpyxl')
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you for your quick reply and I see the apply() approach, but unfortunately it colours the whole row. Is there a way to colour the specific cells of config_size_x and config_size_y only? – Savvas Radevic Jul 04 '18 at 14:25
  • Yes, they are sorted in the end. I admire your zeal! :) – Savvas Radevic Jul 04 '18 at 14:49
  • 1
    @SavvasRadevic - Just added solution for non sorting input data, please check it. :) – jezrael Jul 04 '18 at 14:50
  • thank you! I had a small issue with NaN values, for some reason they were being colored (I was probably checking empty '' strings with NaN). `df.fillna('')` did it, all is well! Speed is something that needs improving, it takes ~40 seconds for 437 rows and 90 columns (compared to the couple of seconds required before applying the styles). But as far as I've understood that's an issue with the `apply()` in dataframes. I will play with it and see if I can improve it. :D – Savvas Radevic Jul 04 '18 at 16:46
  • 1
    I fixed the index too with `df.index = pd.RangeIndex(len(df.index))` and now it takes 20 seconds which is bearable at least for me. – Savvas Radevic Jul 04 '18 at 18:04