3

I am trying to write a pandas/python script the do the following in jupyter notebookssee excel data for example

I need to search column C for each row of data, and look at what number corresponds in that row in col E. I then want it to look for the same number in col G and put the corresponding value it got from E into col I.

If there are multiple instances of a value in col C with different corresponding values in Col E, flag those col C values so I can take a look.

Given col C contains 111 and has code “a” in col E, code “a” would be placed any spot in col I where col G had number 111.

If they do not have same number, Highlight in red those values in col C.

I am having trouble figuring out how to code this up. If anyone can show me that would be greatly appreciated. Thanks

  • so, if colC and colG have same number, we need place colE value in col I without highlighting cell, else we need to highlight if C and G have different values. Is that what you are asking for? – Strange Dec 23 '19 at 19:47
  • Yes. so if col c had value 333, where ever there is "333" in col g, put "c" in col I. Since it corresonds to the 333 – Joshua Zimmerman Dec 23 '19 at 19:53
  • else put c in col I and highlight it right? – Strange Dec 23 '19 at 19:56
  • Please make an attempt at a solution and come back with specific issues with a [reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) (not images of sample data [because...](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557)). – Parfait Dec 23 '19 at 21:11
  • @Parfait Said exactly what I was going to. – AMC Dec 23 '19 at 23:31
  • Check the solution i've posted.. – Strange Dec 24 '19 at 05:37

1 Answers1

1

Here's what you want..

dct = {'C':[111,222,333,111,444],'E':['a','b','c','d','e'],'G':[111,123,333,111,444]}

df = pd.DataFrame(dct)

highlight = []
vals = []
for i in range(len(df)):
    if df['C'][i] == df['G'][i]:
        highlight.append(False)
        vals.append(df['E'][i])
    else:
        highlight.append(True)
        vals.append(None)

df['I'] = vals

def highlight_cells(x):
    c1 = 'background-color: red'
    c2 = '' 

    df1 =  pd.DataFrame(c2, index=df.index, columns=df.columns)
    #modify values of df1 column by boolean highlight

    df1.loc[highlight, 'C'] = c1 #new styled dataframe

    return df1

df.style.apply(highlight_cells, axis=None).to_excel('styled.xlsx', engine='openpyxl')

Initially prepare highlight list(boolean) i.e which we are marking which rows of colC need to be highlighted. Now we use this highlight list in function highlight_cells, which creates new masked dataframe and it is applied to dataframe df using df.style.apply().

Output:

Output image

Strange
  • 1,460
  • 1
  • 7
  • 18