I have a large df (df1) with binary outputs in each column like so:
df1:
a b c d
1 1 0 1 0
2 0 0 0 0
3 0 1 0 1
4 1 1 0 0
5 1 0 0 0
6 1 0 1 1
...
I also have another smaller df (df2) with some "template" rows and I want to check if df1s rows contain. Templates looks like this:
df2:
a b c d
1 1 0 1 0
2 1 1 1 1
3 0 0 0 1
4 1 1 0 0
What I'm trying to do is to search the large df efficiently for these small number of templates, so in this example, rows 1, 3, 4, 6 would match, but 2 and 5 would not match. I want any row in the large df which has any extra 1s to pass the test (i.e. a template row is there but it also has some extra 1s in that row).
I know that I could just have a nested loop and iterate over all the rows of the large and small dfs and match rows as np.arrays, but this seems like an extremely inefficient way to do this. I'm wondering if there are any non-iterative pd-based solutions to this problem?
Thank you so much!
Minor functionality edit: Along with searching and matching, I'm also trying to retain a list of which template row from df2 each row in df1 matched with so I can do statistics on how many templates show up in the large df and which ones they are. This is one of the reasons why this answer(Compare Python Pandas DataFrames for matching rows) doesn't work.