Populate a column in excel given corresponding data with Pandas/Python

Question

I am trying to write a pandas/python script the do the following in jupyter notebookssee excel data for example

I need to search column C for each row of data, and look at what number corresponds in that row in col E. I then want it to look for the same number in col G and put the corresponding value it got from E into col I.

If there are multiple instances of a value in col C with different corresponding values in Col E, flag those col C values so I can take a look.

Given col C contains 111 and has code “a” in col E, code “a” would be placed any spot in col I where col G had number 111.

If they do not have same number, Highlight in red those values in col C.

I am having trouble figuring out how to code this up. If anyone can show me that would be greatly appreciated. Thanks

so, if colC and colG have same number, we need place colE value in col I without highlighting cell, else we need to highlight if C and G have different values. Is that what you are asking for? — Strange, Dec 23 '19 at 19:47
Yes. so if col c had value 333, where ever there is "333" in col g, put "c" in col I. Since it corresonds to the 333 — Joshua Zimmerman, Dec 23 '19 at 19:53
Please make an attempt at a solution and come back with specific issues with a [reproducible example](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) (not images of sample data [because...](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question/285557#285557)). — Parfait, Dec 23 '19 at 21:11

score 1 · Answer 1 · answered Dec 24 '19 at 05:01

Here's what you want..

dct = {'C':[111,222,333,111,444],'E':['a','b','c','d','e'],'G':[111,123,333,111,444]}

df = pd.DataFrame(dct)

highlight = []
vals = []
for i in range(len(df)):
    if df['C'][i] == df['G'][i]:
        highlight.append(False)
        vals.append(df['E'][i])
    else:
        highlight.append(True)
        vals.append(None)

df['I'] = vals

def highlight_cells(x):
    c1 = 'background-color: red'
    c2 = '' 

    df1 =  pd.DataFrame(c2, index=df.index, columns=df.columns)
    #modify values of df1 column by boolean highlight

    df1.loc[highlight, 'C'] = c1 #new styled dataframe

    return df1

df.style.apply(highlight_cells, axis=None).to_excel('styled.xlsx', engine='openpyxl')

Initially prepare highlight list(boolean) i.e which we are marking which rows of colC need to be highlighted. Now we use this highlight list in function highlight_cells, which creates new masked dataframe and it is applied to dataframe df using df.style.apply().

Output:

Populate a column in excel given corresponding data with Pandas/Python

1 Answers1