I have two pandas DataFrames. First with shape (8190, 161) and the second with shape (14026, 3). The first column in both Dataset contains the name. All names in First Dataframe are present in the second DataFrame. My goal is to reduce the second DataFrame shape to the first by keeping only names and corresponding row values, which are present in the first DataFrame, with the same order as in the first one. Here By order, I mean the row names and all values in those rows.
By doing this
y2 = df2.iloc[:, 0]
y1 = df1.iloc[:, 0]
y = [i for i in set(y2) if i not in set(y1)]
I can get the names which are present in the second DataFrame but not in the first. When I print the len(y), it gives me 5836, which are the the additional number of rows present in the second DataFrame.
Here, my problem is to map such rows from the second DataFrame to the first and delete such rows in second DataFrame.
At last, df1.head():
names 0 ... 158 159
0 ID-865950 3.0000000000000004 ... Nan Nan
1 ID-866199 1.0 ... Nan Nan
2 ID-862617 3.0 ... Nan Nan
3 ID-867838 5.0 ... Nan Nan
4 ID-27972 5.0 ... Nan Nan
df2.head():
names B C
0 ID-865950 -0.206854 0.0000
1 ID-866199 -0.268366 0.0000
2 ID-862617 -0.368426 0.0000
3 ID-867838 -0.693050 0.0000
4 ID-27972 -2.103586 4.1045
As you can see the names in the first and second DataFrame are in the same order at last.
Thanks in advance.
Update: The post "Pandas Merging 101" explains about merging the DataFrames but I wanted to return only the second DataFrame.