I have two csv files and I am trying to combine them in one based on a condition.
they both have the 'Import_ID' column which gives unique id for each row. What I need to do is to check if the Import_ID from df1 already exist in df2, if not add this row, else skip it. It would be best if it would be done without the loops or iterrows().
df1
Import_ID State ProductName
0 7027514279_100 716 TRANSCEND ESD220C 240GB SSD extern
1 7027514279_200.1 716 INTELLINET 48,3cm 19Zoll Steckdosenleist
2 7027514279_200.2 716 INTELLINET 48,3cm 19Zoll Steckdosenleist
3 7027514279_200.3 716 INTELLINET 48,3cm 19Zoll Steckdosenleist
4 7027514279_200.4 716 INTELLINET 48,3cm 19Zoll Steckdosenleist
and df2
Import_ID State ProductName
0 7027329459_100 716 BROTHER MFCJ6530DW MULTI-FUNCTION (AT)
1 7027514279_200.2 716 INTELLINET 48,3cm 19Zoll Steckdosenleist
2 7027514279_200.3 716 INTELLINET 48,3cm 19Zoll Steckdosenleist
3 7027329459_400 716 BROTHER LC-3219XLY Tinte Gelb (1500 S)
4 7027329459_500 20010 BROTHER LC-3219XLBK Tinte Schwarz
what I would like it to be is that df1 rows are only then imported when their Import_ID doesn't exist in the df2
df2 updated:
Import_ID State ProductName
0 7027329459_100 716 BROTHER MFCJ6530DW MULTI-FUNCTION (AT)
1 7027514279_200.2 716 INTELLINET 48,3cm 19Zoll Steckdosenleist
2 7027514279_200.3 716 INTELLINET 48,3cm 19Zoll Steckdosenleist
3 7027329459_400 716 BROTHER LC-3219XLY Tinte Gelb (1500 S)
4 7027329459_500 20010 BROTHER LC-3219XLBK Tinte Schwarz
5 7027514279_100 716 TRANSCEND ESD220C 240GB SSD extern
6 7027514279_200.1 716 INTELLINET 48,3cm 19Zoll Steckdosenleist
7 7027514279_200.4 716 INTELLINET 48,3cm 19Zoll Steckdosenleist
so in the example the df1 rows 1 nad 2 are not copied, because they already exist.
I tried by creating a temp file where I would store all of the rows from df1 that don't exist in df2 but it didn't work.
temp = numpy.where(file_in['Import_ID'] not in finalData['Import_ID'], file_in)
I don't know what else I should try , since I am pretty new to python.