1

Everyone, I have some problems for to leaning program by myself

  1. I have two different excel to compare...

Data1.xlsx

|  Name   |  Reg Date  |
|Annie    | 2021-07-01 |
|Billy    | 2021-07-02 |
|Cathrine | 2021-07-03 |
|David    | 2021-07-04 |
|Eric     | 2021-07-04 |

Data2.xlsx

|  Name   |   City    |  Reg Date  | Gender | Data1.xlsx |
|Alex     | Hong Kong | 2021-07-04 | Male   |            |
|Annie    | Hong Kong | 2021-07-01 | Female |            |
|Bob      | Taipei    | 2021-07-02 | Male   |            |
|Lucy     | Tokyo     | 2021-07-01 | Female |            |
|David    | London    | 2021-07-04 | Male   |            |
|Kate     | New York  | 2021-07-03 | Female |            |
|Cathrine | London    | 2021-07-03 | Female |            |
|Rose     | Hong Kong | 2021-07-04 | Female |            |
  1. I get 'Name' & 'Reg Date' for key to merge

    import pandas as pd 
    dt1 = pd.read_excel('Data1.xlsx')
    dt2 = pd.read_excel('Data2.xlsx')
    df_merge = pd.merge(dt1.iloc[:, [0, 1]], dt2.iloc[:, [0, 2]], on=['Name', 'Reg Date'], how='outer', indicator=True)
    
    i = 0
    rows_to_color = []
    
    for a in df_merge.iloc[:, [2]].values:
        if a == 'both':
           rows_to_color.append(i)
        i += 1
    
    
    |  Name   |  Reg Date  |   _merge   |
    |Alex     | 2021-07-04 | right_only |
    |Annie    | 2021-07-01 | both       |
    |Billy    | 2021-07-02 | left_only  | 
    |Bob      | 2021-07-02 | right_only |
    |Lucy     | 2021-07-01 | right_only |
    |David    | 2021-07-04 | both       |
    |Eric     | 2021-07-04 | left_only  |
    |Kate     | 2021-07-03 | right_only |
    |Cathrine | 2021-07-03 | both       |
    |Rose     | 2021-07-04 | right_only |
    
  2. I try to coding to highlight 'left_only' & 'right_only' for 'Data2.xlsx', but not work.

    def bg_color(col):
    color = '#ffffff'
    return 'background-color: %s' % color
    if i in rows_to_color:
        for i, x in col.iteritems():
            styled = df_merge.style.apply(bg_color)
    
  3. I have no idea how to make highlight unmatch row and mark 'Y/N' in 'Data2.xlsx', below pic is my expected result. would you mind to teach me how to coding?

    enter image description here

CHL
  • 87
  • 6

1 Answers1

0

Use left join in merge and set Y/N by numpy.where first:

#change order dt2, dt1
df_merge = pd.merge(dt2, 
                    dt1[['Name', 'Reg Date']], 
                    on=['Name', 'Reg Date'], 
                    how='left', indicator=True)
df_merge['Data1.xlsx'] = np.where(df_merge.pop('_merge').eq('both'), 'Y', 'N')
print (df_merge)
       Name       City    Reg Date  Gender Data1.xlsx
0      Alex  Hong Kong  2021-07-04    Male          N
1     Annie  Hong Kong  2021-07-01  Female          Y
2       Bob     Taipei  2021-07-02    Male          N
3      Lucy      Tokyo  2021-07-01  Female          N
4     David     London  2021-07-04    Male          Y
5      Kate   New York  2021-07-03  Female          N
6  Cathrine     London  2021-07-03  Female          Y
7      Rose  Hong Kong  2021-07-04  Female          N

And then set colors by N rows:

def bg_color(x):
    c = 'background-color: yellow'
    # condition
    m = x["Data1.xlsx"].eq('N')
    # DataFrame of styles
    df1 = pd.DataFrame('', index=x.index, columns=x.columns)

    # set columns by condition
    return df1.mask(m, c)

styled = df_merge.style.apply(bg_color, axis=None)

styled.to_excel('styled.xlsx', engine='openpyxl', index=False)
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thank you for your helpful and add to my knowledge, but i want to copy ' Data1.xlsx' column data of merged to original 'Data2' excel file, not save to new excel file... – CHL Jan 31 '22 at 12:25
  • @CHL - there are some another columns? Or some another sheets? – jezrael Jan 31 '22 at 12:31
  • @CHL - Understand, what is reason cannot create another file? Or is possible use `styled.to_excel('Data2.xlsx', engine='openpyxl', index=False)` for overwrite file `Data2.xlsx` ? – jezrael Jan 31 '22 at 14:13
  • Sorry English isn't my first language, I just have 'Data1' with 1 sheet and 'Data2' with 1 sheet excel files. Before compare two excel file, 'Data1.xlsx' column of 'Data2' is empty. I though can only copy and paste 'Y/N' value to 'column of Data2' excel file after compare. Maybe I have not enough programming logic... – CHL Jan 31 '22 at 14:23
  • @CHL - then use [this](https://stackoverflow.com/questions/20219254/how-to-write-to-an-existing-excel-file-without-overwriting-data-using-pandas) solution - possible, but not easy. – jezrael Jan 31 '22 at 14:26
  • In addition to overwrite, is do you know any about 'copy and paste' method? – CHL Jan 31 '22 at 14:33
  • @CHL - it is python, not possible same way like human – jezrael Jan 31 '22 at 14:33