I am attempting to merge 2 dataframes in pandas based on a common column. Reading the documentation on merging, everything I am doing implicates that the dataframes should merge on the columns specified, while only keeping common values. It certainly keeps the common values, but produces a massive dataframe that is essentially an entire "right" dataframe for every index in the "left" dataframe.
My left dataframe is
left
Out[325]:
A B C D MergeThis
0 38 43 20 34 -N2
1 14 44 23 91 -N2
2 55 50 30 92 -N2
3 29 20 56 0 -N2
4 61 6 38 44 -N2
5 18 61 30 7 -N2
6 67 45 60 81 -N2
7 7 53 60 72 +N2
8 15 46 0 82 +N2
9 6 90 83 36 +N2
10 0 87 64 82 +N2
11 13 22 13 39 +N2
12 65 88 50 34 +N2
13 35 31 72 34 +N2
and my right data frame is
right
Out[326]:
E F G H MergeThis
0 22 81 97 28 +N2
1 20 4 44 48 +N2
2 19 41 52 25 +N2
3 30 37 66 24 +N2
4 32 40 56 92 +N2
5 95 76 91 31 +N2
6 36 66 17 85 +N2
Obviously, I am attempting to merge on the "MergeThis" column, such that I end up with the +N2 values combined, with the columns from both "left" and "right". However, what I end up with is something strange:
left.merge(right, on='MergeThis')
Out[327]:
A B C D MergeThis E F G H
0 7 53 60 72 +N2 22 81 97 28
1 7 53 60 72 +N2 20 4 44 48
2 7 53 60 72 +N2 19 41 52 25
3 7 53 60 72 +N2 30 37 66 24
4 7 53 60 72 +N2 32 40 56 92
5 7 53 60 72 +N2 95 76 91 31
6 7 53 60 72 +N2 36 66 17 85
7 15 46 0 82 +N2 22 81 97 28
8 15 46 0 82 +N2 20 4 44 48
9 15 46 0 82 +N2 19 41 52 25
10 15 46 0 82 +N2 30 37 66 24
11 15 46 0 82 +N2 32 40 56 92
12 15 46 0 82 +N2 95 76 91 31
13 15 46 0 82 +N2 36 66 17 85
14 6 90 83 36 +N2 22 81 97 28
15 6 90 83 36 +N2 20 4 44 48
16 6 90 83 36 +N2 19 41 52 25
17 6 90 83 36 +N2 30 37 66 24
18 6 90 83 36 +N2 32 40 56 92
19 6 90 83 36 +N2 95 76 91 31
20 6 90 83 36 +N2 36 66 17 85
21 0 87 64 82 +N2 22 81 97 28
22 0 87 64 82 +N2 20 4 44 48
23 0 87 64 82 +N2 19 41 52 25
24 0 87 64 82 +N2 30 37 66 24
25 0 87 64 82 +N2 32 40 56 92
26 0 87 64 82 +N2 95 76 91 31
27 0 87 64 82 +N2 36 66 17 85
28 13 22 13 39 +N2 22 81 97 28
29 13 22 13 39 +N2 20 4 44 48
30 13 22 13 39 +N2 19 41 52 25
31 13 22 13 39 +N2 30 37 66 24
32 13 22 13 39 +N2 32 40 56 92
33 13 22 13 39 +N2 95 76 91 31
34 13 22 13 39 +N2 36 66 17 85
35 65 88 50 34 +N2 22 81 97 28
36 65 88 50 34 +N2 20 4 44 48
37 65 88 50 34 +N2 19 41 52 25
38 65 88 50 34 +N2 30 37 66 24
39 65 88 50 34 +N2 32 40 56 92
40 65 88 50 34 +N2 95 76 91 31
41 65 88 50 34 +N2 36 66 17 85
42 35 31 72 34 +N2 22 81 97 28
43 35 31 72 34 +N2 20 4 44 48
44 35 31 72 34 +N2 19 41 52 25
45 35 31 72 34 +N2 30 37 66 24
46 35 31 72 34 +N2 32 40 56 92
47 35 31 72 34 +N2 95 76 91 31
48 35 31 72 34 +N2 36 66 17 85
Which has essentially created a duplicate dataframe for every index of my "right" dataframe.
On the one hand, I know how to do another workaround to get the dataframe I want with more code and breaking up one of the dataframes a bit more so that the merge is "prettier". However, I really want to know WHY this is happening? I want to be able to use the merge functionality in the future, but I don't fully understand what pandas is trying to do to produce this.
Any help is greatly appreciated.