-1

I have two csvs full of contact information. My aim is to check if data from any cell in the first one are anywhere in the second one, and if so copy whole row, in which info was found and append it to new DataFrame. Is there any way to get output in well formated df?

Registered = []
string = []
o = df1.shape
p = df2.shape
xmax = o[0]
ymax = o[1]
zmax = p[0]
rmax = p[1]
found = 0
    for x in range (xmax):
        for y in df1:
            cell = df1.iat[x,y]
            repr(cell).lower()
            for z in range (zmax):
                for r in df2:
                    found = repr(key) in repr(data)
                    if found is True:
                    string = df1.iloc[[z]]
                    Registered.append(string)
                while r <= (rmax - 2):
                r = r + 1
        while y <= (ymax - 2):
        y = y + 1
df3 = pd.DataFrame(Registered)
  • 1
    Why don't you post example dataframes as well? – J...S May 30 '18 at 08:44
  • to make queries in dataframes, have a look at this question https://stackoverflow.com/questions/17071871/select-rows-from-a-dataframe-based-on-values-in-a-column-in-pandas/17071908#17071908 – nahusznaj May 30 '18 at 08:58

2 Answers2

0

Try this:

df3=pd.concat([df1,df2])
result=df3.drop_duplicates()
result.to_excel("OF.xls")
0

If I understand you right, you want to extract those records that do exist in both DataFrames? If so, the approach from Kallol is not the worst idea but instead of removing the duplicated entries we want to keep the duplicated entries:

 df_result = pd.concat([df1, df2])
 df_result = df_result[df_result.duplicated()]

And if you want to do that based on one or more columns you can just specify that through the subset parameter of the duplicated() method:

df_result = pd.concat([df1, df2])
df_result = df_result[df_result.duplicated(subset="my_foreign_key_column")]
meow
  • 2,062
  • 2
  • 17
  • 27