0

I am trying to compare two csv file. I want to compare similarity user_name if the user_name have same name as table1 and table2 create table3 print

for example

table1.csv
id_Acco,     user_name,       post_time
1543603,     SameDavie ,      "2020/09/06"
1543595,     Johntim,         "2020/09/11"
1558245,     ACAtesdfgsf ,    "2020/09/19"

table2.csv
id_Acco,     user_name,     post_time
1543603,    SameDavie,      "2020/09/06"
1543595,    Johntim ,       "2020/09/11"
1558245,    Davidwillian,   "2020/09/19"

OutPut

table3.csv

id_Acco,     user_name,     post_time
1543603,     SameDavie ,    "2020/09/06"
1543595,     Johntim ,      "2020/09/11"

code

`A = pd.read_csv(r'table1.csv')
 B= pd.read_csv(r'table2.csv')
 print(A-B) 
 print(B-A)
john div
  • 15
  • 4
  • Does it answer your question? https://stackoverflow.com/questions/19618912/finding-common-rows-intersection-in-two-pandas-dataframes – Arkadiusz May 28 '21 at 18:50
  • 1
    Do the rows matter or do you just want a list of the user names that appear in both files? – RJ Adriaansen May 28 '21 at 18:54

1 Answers1

1

You can concat A and B and check for duplicated records:

z = pd.concat([A, B])[['user_name']]
z.loc[z.duplicated()].to_csv('table3.csv')

Output (in table3.csv):

   user_name
0  SameDavie
1    Johntim

P.S. And if you have those trailing spaces sometimes in your files like you have in the example, you may want to strip them after concatenation:

z = pd.concat([A, B])['user_name'].str.strip()
z.loc[z.duplicated()].to_frame().to_csv('table3.csv')
perl
  • 9,826
  • 1
  • 10
  • 22
  • thank you and another question how can I pass id_Acco and post_time after it check for duplicated – john div May 28 '21 at 19:06
  • If you don't specify the column for `z` then it will use all of them, so `z = pd.concat([A, B])` as the first line, and then the same `z.loc[z.duplicated()]` (I'm assuming you stripped those spaces already -- if not, you can do it after concatenation) – perl May 28 '21 at 19:15