I'm trying to find matching values in a pandas dataframe. Once a match is found I want to perform some operations on the row of the dataframe.
Currently I'm using this Code:
import pandas as pd
d = {'child_id': [1, 2,5,4], 'parent_id': [3, 4,2,3], 'content': ["a","b","c","d"]}
df = pd.DataFrame(data=d)
for i in range(len(df)):
for j in range(len(df)):
if str(df['child_id'][j]) == str(df['parent_id'][i]):
print(df.content[i])
else:
pass
It works fine, but is rather slow. Since I'm dealing with a dataset with millions of rows, I would take months. Is there a faster way to do this?
Edit: To clarify what, I want to create is a dataframe, which contains the Content of Matches.
import pandas as pd
d = {'child_id': [1,2,5,4],
'parent_id': [3,4,2,3],
'content': ["a","b","c","d"]}
df = pd.DataFrame(data=d)
df2 = pd.DataFrame(columns = ("content_child", "content_parent"))
for i in range(len(df)):
for j in range(len(df)):
if str(df['child_id'][j]) == str(df['parent_id'][i]):
content_child = str(df["content"][i])
content_parent = str(df["content"][j])
s = pd.Series([content_child, content_parent], index=['content_child', 'content_parent'])
df2 = df2.append(s, ignore_index=True)
else:
pass
print(df2)