I'm creating a merge from two tables. First table look like this:
a b c
0 32 171 28
1 32 172 28
2 1014 173 28
3 1014 179 28
4 1014 154 26
5 1049 156 26
2nd table looks like this:
a d
0 32 fdxjgset
1 32 5j4j64j4
2 1014 4564jsr5
3 1014 5jhszxse
4 1014 kuts5555
5 1049 srh5jx5x
I'm expecting to get something like this:
a b c d
0 32 171 28 fdxjgset
1 32 172 28 5j4j64j4
2 1014 173 28 4564jsr5
3 1014 179 28 5jhszxse
4 1014 154 26 kuts5555
5 1049 156 26 srh5jx5x
But I'm getting duplicates of the duplicate 'a' rows like this:
a b c d
0 32 171 28 fdxjgset
1 32 172 28 5j4j64j4
2 32 171 28 fdxjgset
3 32 172 28 5j4j64j4
4 1014 173 28 4564jsr5
5 1014 179 28 5jhszxse
6 1014 154 26 kuts5555
7 1014 173 28 4564jsr5
8 1014 179 28 5jhszxse
9 1014 154 26 kuts5555
10 1049 156 26 srh5jx5x
My code is:
data_1 = pd.read_csv("First file.csv",encoding='latin1')
data_2 = pd.read_csv("Second file.csv",encoding='latin1')
data_2_dups = data_zips.drop_duplicates() #remove duplicates
data = data_1.merge(data_2_dups, on='a', how = 'left', indicator=True)
#data1 = data.drop_duplicates()
data.to_csv("merged file.csv")
Now I did remove all duplicates like others said on different threads here, but that doesn't seem to work. It's still creating duplicate rows for duplicates. Any idea what am I doing wrong? Thanks.