1

I have firstDF:

rs     Chr      MapInfo         Name       SourceSeq
1       A1       B1              C1          D1
2       A2       B2              C2          D2
3       A3       B3              C3          D3
4       A4       B4              C4          D4
5       A5       B5              C5          D5

And secondDF:

Chr       MapInfo     Name    SourceSeq       Unnamed: 0       rs
 1          A1          B1        C1             D1            E1
 4          A4          B4        C4             D4            E4
 8          A8          B8        C8             D8            E8
 10         A10         B10       C10            D10           E10

I need to create a new data frame contains only rows from secondDF which does not exist in first:

newDF:

Chr       MapInfo     Name    SourceSeq       Unnamed: 0       rs
8          A8          B8        C8             D8            E8
10         A10         B10       C10            D10           E10

I want filter it by Name. What will be better way to do that?

I trough about a fullouter merge but the cols are different and honestly I don't know how to do it proper.

Second, think was a loop but it's not efficient.

And last I tried do ith by:

new= secondDF[~firstDF.Name.isin(secondDF.name)] 

but i got:

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match

Can someone give me advice about that task?

Dvyn Resh
  • 980
  • 1
  • 6
  • 14
martin
  • 1,145
  • 1
  • 7
  • 24
  • 1
    https://stackoverflow.com/questions/23460345/selecting-unique-rows-between-two-dataframes-in-pandas – ComplicatedPhenomenon Aug 01 '19 at 07:19
  • "*index of the boolean Series and of the indexed object do not match*" , try resetting the index – anky Aug 01 '19 at 07:20
  • you can get help regarding this from [here](https://stackoverflow.com/a/28902170/11548219) – Shafiqa Iqbal Aug 01 '19 at 07:36
  • 1
    In your sample need `new= firstDF[~firstDF.Name.isin(secondDF.name)]`, also columns values not matched between both DataFrames? Like need match `SourceSeq from first with Unnamed: 0 from second` ? – jezrael Aug 01 '19 at 07:46

1 Answers1

1

Solution is change mask - compare secondDF.Name by column from firstDF, from sample data it is MapInfo column, in real data seems Name column for boolean mask with same size and index values like secondDF, because is filtered secondDF DataFrame:

new= secondDF[~secondDF.Name.isin(firstDF.MapInfo)] 
print (new)
   Chr MapInfo Name SourceSeq Unnamed: 0   rs
2    8      A8   B8        C8         D8   E8
3   10     A10  B10       C10        D10  E10
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252