I have two dataframes:
all_data:
AID VID Freq
0 00016A3E 0127C661 1
1 00016A3E 0C05DA5D 2
2 00016A3E 0C032814 1
3 00016A3E 0BF6C78D 1
4 00016A3E 0A79DFF1 1
5 00016A3E 07BD2FB2 1
6 00016A3E 0790E61B 1
7 00016A3E 0C24ED25 3
8 00016A3E 073630B5 3
9 00016A3E 06613535 1
10 00016A3E 05F809AF 1
11 00016A3E 05C625FF 1
12 00016A3E 04220EA8 4
13 00016A3E 013A29E5 1
14 00016A3E 0761C98A 1
15 00016AE9 0A769475 16
16 00016AE9 0A7DED0A 2
17 00016AE9 0ABF60DF 9
18 00016AE9 0AE3F25A 2
19 00016AE9 0AEFE12F 5
20 00016AE9 0BD8975A 2
21 00016AE9 44DF880B 1
22 00016AE9 43F9E08E 2
23 00016AE9 44EA5E08 2
24 00016AE9 4539ED1E 16
25 00016AE9 8516B55A 4
26 00016AE9 0972AFF2 1
27 00016AE9 0C559B34 1
28 00016AE9 06B5C040 7
29 00016AE9 0B0426FA 1
subset:
AID VID Freq
0 00016A3E 0C24ED25 3
1 00016A3E 0C05DA5D 2
2 00016AE9 0B0426FA 1
3 00016AE9 0AEFE12F 5
I need to create a third dataframe that has all the rows that are in all_data
that DO NOT exist in subset
. Note that all rows in subset
exist in all_data
.
So the new df in this instance should be:
AID VID Freq
0 00016A3E 0127C661 1
2 00016A3E 0C032814 1
3 00016A3E 0BF6C78D 1
4 00016A3E 0A79DFF1 1
5 00016A3E 07BD2FB2 1
6 00016A3E 0790E61B 1
8 00016A3E 073630B5 3
9 00016A3E 06613535 1
10 00016A3E 05F809AF 1
11 00016A3E 05C625FF 1
12 00016A3E 04220EA8 4
13 00016A3E 013A29E5 1
14 00016A3E 0761C98A 1
15 00016AE9 0A769475 16
16 00016AE9 0A7DED0A 2
17 00016AE9 0ABF60DF 9
18 00016AE9 0AE3F25A 2
20 00016AE9 0BD8975A 2
21 00016AE9 44DF880B 1
22 00016AE9 43F9E08E 2
23 00016AE9 44EA5E08 2
24 00016AE9 4539ED1E 16
25 00016AE9 8516B55A 4
26 00016AE9 0972AFF2 1
27 00016AE9 0C559B34 1
28 00016AE9 06B5C040 7
I tried the methods described here: pandas get rows which are NOT in other dataframe but they don't work as the indices in each dataframe don't match.