I'm trying to write a python script that concats two csv files and then drops the duplicate rows. Here is an example of the csv's I'm concating:
csv_1
type state city date estimate id
lux tx dal 2019/08/15 .8273452 10
sed ny ny 2019/05/12 .624356 10
cou cal la 2013/04/24 .723495 10
. . . . . .
. . . . . .
csv_2
type state city date estimate id
sed col den 2013/05/02 .7234957 232
sed mi det 2015/11/17 .4249357 232
lux nj al 2009/02/29 .627234 232
. . . . . .
. . . . . .
As of now, my code to concat these two together looks like this:
csv_1 = pd.read_csv('csv_1.csv')
csv_2 = pd.read_csv('csv_2.csv')
union_df = pd.concat([csv_1, csv_2])
union_df.drop_duplicates(subset=['type', 'state', 'city', 'date'], inplace=True, keep='first')
Is there any way I can ensure only rows with id = 232 are deleted and none with id = 10 are? Just a way to specify only rows from the second csv are removed from the concatenated csv?
Thank you