0

I want to delete a row from one dataset (df1) whenever in column "A" of that row is a string that is not part of any row from another dataset (df2), for example:

df1:    df2:
A B     A  B   
h 1     j  5 
i 2     h  3 
j 6     p  3
g 2     t  1

In this case, by analysing only column A of df2, as "j" and "h" exist in df1, these rows remain in df2. However, there is no "p" or "t" in the df1 list, so I would like these lines to be deleted from the df2 dataset.

I tried to create a loop for this goal, however, as I acceded to the indices of each row (0, 1, 2, 3 ..) and since I already had to erase other rows for other reasons, many indexes have been lost, so it starts for example in 120, then it goes do 127 and so on...

Can anyone help me in any way? Thank you very much in advance!

bonaqua
  • 101
  • 7

1 Answers1

0

Here you go

import pandas as pd

df1 = pd.DataFrame({ 'A' : ['h','i','j','g'], 'B' : [1,2,6,2]})
df2 = pd.DataFrame({ 'A' : ['h','a','j','c'], 'B' : [1,2,6,2]})
df2 = df2[df2['A'] == df1['A']]
High-Octane
  • 1,104
  • 5
  • 19
  • It shows an error: "Can only compare identically-labeled Series objects" – bonaqua Jan 16 '20 at 11:20
  • This is the below output I got from my code. It is working for me. ``` import pandas as pd df1 = pd.DataFrame({ 'A' : ['h','i','j','g'], 'B' : [1,2,6,2]}) df2 = pd.DataFrame({ 'A' : ['h','a','j','c'], 'B' : [1,2,6,2]}) df2 = df2[df2['A'] == df1['A']] print(df2) A B 0 h 1 2 j 6 ``` Can you share yours? – High-Octane Jan 16 '20 at 11:28
  • I am working with a large dataset, but in order to simplify the problem I used that example. But in reality I am working with 7000 rows and 130 columns. Acutally the column "A", the one I am interested, corresponds to codes like "0zheb". I am reading the excel of the data, and then divide the dataset (df) by 2: df1 (first 3500 rows) and df2 (last 3500 rows). The goal is to have in df2 only codes that appear on any of the rows in column "A" of df1. If it does not appear, the line of this code is deleted from df1. – bonaqua Jan 16 '20 at 11:39