-1

I have 2 CSV files.

  • The first CSV file simply has one field (or column) name titled 'video_url' that contains a list of unique URL's.
  • The second CSV file contains a wide array of 10-15 field names. HOWEVER, one of the field names has the same 'video_url' field name mentioned in the first CSV file.

Here is my dilemma/problem statement:

I am trying to write python code that can compare these two csv files using only the 'video_url' column/field, and if there is an exact match, that record is NOT included in a new CSV file. The new CSV file would include only records where there wasn't an exact match.

(and please bear with me as I am completely new to Python and programming in general).

  • I apologize - it should be *deduplicated – MrsBreadstick Nov 27 '19 at 21:37
  • can you show what you have tried so far, but this sounds like an outer join problem regardless – gold_cy Nov 27 '19 at 21:37
  • [`read_csv`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) and [`merge`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html) are going to be your friends here. Read -> Merge on video_url. done. – MattR Nov 27 '19 at 21:42

1 Answers1

0

What I would do is iterate row by row on your dataframe2 and checking if it is in the dataframe1, and if not, adding it to a new dataframe. For example:

    dataframe1 = pd.read_csv('your_file_1.csv')
    dataframe2 = pd.read_csv('your_file_2.csv')

    new_dataframe = []

    for i in dataframe2:
        if i['video_url'] not in dataframe1['video_url']:
           new_dataframe.append(i)

After doing this you can print new_dataframe and it should be a list with the rows where the video_url of dataframe2 are not in dataframe1

  • however if your datasets are too big, it may take too much time – Jorge A. Salazar Nov 27 '19 at 21:48
  • I apologize for the late reply. My code was something similar to this as this was a similar problem to here (https://stackoverflow.com/questions/38996033/python-compare-two-csv-files-and-print-out-differences/38996374). However, I am receiving a "TypeError: string indices must be integers". – MrsBreadstick Dec 02 '19 at 02:00