0

I want to compare 2 csv files that is file 1 and file 2 on basis of column. If column of file 1 matches with column of file 2 then remove that entire row of file 1.

Example of file 1

sr. no.,username,id
101,Berlin240,835070687
102,X_PSYCH_X,1271001789 
103,xenoo369,570078204
104,xarat581,1665916522
105,xandy88,639040049

Example of file 2:

sr. no.,username,id
101,Berlin240,835070687
103,xenoo369,570078204
105,xandy88,639040049

Now comparing file2 and removing all rows in file 1 that matches with the column of file 1.

Now the file1 looks like this:

sr. no.,username,id
102,X_PSYCH_X,1271001789
104,xarat581,1665916522

halfer
  • 19,824
  • 17
  • 99
  • 186
  • Can you post what you've tried, and let us know what part of your code isn't working or what challenges you're running into? – Simon Mar 07 '22 at 03:44
  • actually dont know where to start – rahul gupta Mar 07 '22 at 03:45
  • No worries, it appears to be your problem is similar to another post. Try reviewing this [post](https://stackoverflow.com/questions/38996033/python-compare-two-csv-files-and-print-out-differences). Once you have a specific problem with the code, feel free to create another post or update this one – Simon Mar 07 '22 at 03:49

2 Answers2

0

The following code is a python solution. First install pandas. You can probably accomplish this with the command

>>> pip install pandas

The code is ....

import pandas as pd

file1_path = r"D:\Development\Test\file1.csv"
file2_path = r"D:\Development\Test\file2.csv"

df1 = pd.read_csv(file1_path)
df2 = pd.read_csv(file2_path)

new_df = df1[~df1['id'].isin(df2['id'])]

print(new_df)

new_df.to_csv(file1_path)  # saving difference in file1

To see the output open your file1. output:

   sr. no.    username          id
     102     X_PSYCH_X    1271001789
     104     xarat581     1665916522
bob.sacamento
  • 6,283
  • 10
  • 56
  • 115
N S
  • 1
  • 1
-1

solution:

import pandas as pd

df1 = pd.read_csv("df1.csv")
df2 = pd.read_csv("df2.csv")

print(df1)
print(df2)

df_diff = pd.concat([df1,df2]).drop_duplicates(keep=False)

print(df_diff)

df1

     sr. no.  username      id
0      101  Berlin240   835070687
1      102  X_PSYCH_X  1271001789
2      103   xenoo369   570078204
3      104   xarat581  1665916522
4      105    xandy88   639040049

df2

     sr. no.  username     id
0      101  Berlin240  835070687
1      103   xenoo369  570078204
2      105    xandy88  639040049

df_diff

     sr. no.  username      id
1      102  X_PSYCH_X  1271001789
3      104   xarat581  1665916522
wekular
  • 91
  • 9