1

I have two pandas DataFrames. First with shape (8190, 161) and the second with shape (14026, 3). The first column in both Dataset contains the name. All names in First Dataframe are present in the second DataFrame. My goal is to reduce the second DataFrame shape to the first by keeping only names and corresponding row values, which are present in the first DataFrame, with the same order as in the first one. Here By order, I mean the row names and all values in those rows.

By doing this

y2 = df2.iloc[:, 0]
y1 = df1.iloc[:, 0] 
y = [i for i in set(y2) if i not in set(y1)]

I can get the names which are present in the second DataFrame but not in the first. When I print the len(y), it gives me 5836, which are the the additional number of rows present in the second DataFrame.

Here, my problem is to map such rows from the second DataFrame to the first and delete such rows in second DataFrame.

At last, df1.head(): 

      names                             0 ...   158  159
0  ID-865950       3.0000000000000004 ...   Nan  Nan
1  ID-866199                      1.0 ...   Nan  Nan
2  ID-862617                      3.0 ...   Nan  Nan
3  ID-867838                      5.0 ...   Nan  Nan
4  ID-27972                       5.0 ...   Nan  Nan

df2.head():

      names              B          C
0  ID-865950         -0.206854    0.0000
1  ID-866199         -0.268366    0.0000
2  ID-862617         -0.368426    0.0000
3  ID-867838         -0.693050    0.0000
4  ID-27972         -2.103586     4.1045

As you can see the names in the first and second DataFrame are in the same order at last.

Thanks in advance.

Update: The post "Pandas Merging 101" explains about merging the DataFrames but I wanted to return only the second DataFrame.

hemanta
  • 1,405
  • 2
  • 13
  • 23
  • 1
    do you need `df = df1.merge(df2, on='names')` ? – jezrael Feb 09 '19 at 06:48
  • 1
    Actually if we can write second DataFrame as the first one by removing excess rows would be enough but the order of rows in the resulting DataFrame should be same as in first DataFrame. Thanks! – hemanta Feb 09 '19 at 06:53
  • So it working nice? Or not? – jezrael Feb 09 '19 at 06:53
  • @jezrael, by doing so it will merge two DataFrames in a way that I want. That works and solve my problem for now. I still have in mind that, there might be a way to write the second DataFrame separately after doing such operation. Any suggestions? – hemanta Feb 09 '19 at 17:57
  • Are values unique in first column? – jezrael Feb 09 '19 at 18:00
  • Do you mean the names? If so yes, but not in other columns. – hemanta Feb 09 '19 at 18:01
  • 1
    So possible solution is `df1[y1.isin(y2)]` and then sorting both DataFrames by first columns. – jezrael Feb 09 '19 at 18:06
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/188154/discussion-between-hemanta-and-jezrael). – hemanta Feb 09 '19 at 18:07

1 Answers1

0

By following jezrael suggestions, I got the desired result by doing:

df3 = df2[y2.isin(y1)]
df3.sort_values('names')

Thanks jezreal.

hemanta
  • 1,405
  • 2
  • 13
  • 23
  • Now, I have additional point: is there a way to do something like df3 = df2[y2 not in y1], I tried but this did not worked? – hemanta Feb 11 '19 at 19:08