Pandas how to skim data frame for string from another df and get all incomon to new df

Question

I have two csvs full of contact information. My aim is to check if data from any cell in the first one are anywhere in the second one, and if so copy whole row, in which info was found and append it to new DataFrame. Is there any way to get output in well formated df?

Registered = []
string = []
o = df1.shape
p = df2.shape
xmax = o[0]
ymax = o[1]
zmax = p[0]
rmax = p[1]
found = 0
    for x in range (xmax):
        for y in df1:
            cell = df1.iat[x,y]
            repr(cell).lower()
            for z in range (zmax):
                for r in df2:
                    found = repr(key) in repr(data)
                    if found is True:
                    string = df1.iloc[[z]]
                    Registered.append(string)
                while r <= (rmax - 2):
                r = r + 1
        while y <= (ymax - 2):
        y = y + 1
df3 = pd.DataFrame(Registered)

to make queries in dataframes, have a look at this question https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas/17071908#17071908 — nahusznaj, May 30 '18 at 08:58

score 0 · Answer 1 · answered May 30 '18 at 09:51

0

Try this:

df3=pd.concat([df1,df2])
result=df3.drop_duplicates()
result.to_excel("OF.xls")

answered May 30 '18 at 09:51

Thanks, that's nice, but the format of text isn't the same in documents( – Nataliya Solhan Apr 28 '19 at 13:13

meow · Answer 2 · 2018-05-30T15:09:58.793

If I understand you right, you want to extract those records that do exist in both DataFrames? If so, the approach from Kallol is not the worst idea but instead of removing the duplicated entries we want to keep the duplicated entries:

 df_result = pd.concat([df1, df2])
 df_result = df_result[df_result.duplicated()]

And if you want to do that based on one or more columns you can just specify that through the subset parameter of the duplicated() method:

df_result = pd.concat([df1, df2])
df_result = df_result[df_result.duplicated(subset="my_foreign_key_column")]

Pandas how to skim data frame for string from another df and get all incomon to new df

2 Answers2