I have table1, which can range from <100 to about 100,000 rows. It contains 22 columns, one of which is a Description containing a string where I want to search some terms.
table1
+----------+--------+--------+--------+-------------+------+-------+-------+
| UniqueId | Cat1Id | Cat2Id | Cat3Id | Col4 | Col5 | Col21 | Col22 |
+----------+--------+--------+--------+-------------+------+-------+-------+
| 1 | 123 | | 33 | Description | | | |
| 2 | 11 | 22 | | Description | | | |
| 3 | 22 | 22 | 33 | Description | | | |
| 4 | 33 | | 455 | Description | | | |
| 5 | 111 | | | Description | | | |
| | | | | | | | |
+----------+--------+--------+--------+-------------+------+-------+-------+
table2 below contains substrings in col Term that I want to search in Col4 above based on the Id's. This can contain hundreds of rows.
table2
+--------+--------+--------+------+------+------+
| Cat1Id | Cat2Id | Cat3Id | Term | Val1 | Val2 |
+--------+--------+--------+------+------+------+
| 123 | | 33 | abc | 555 | 66 |
| 123 | | 33 | bca | 444 | 55 |
| 11 | 22 | | blah | 888 | 77 |
| 33 | | 455 | dddd | 999 | 77 |
| 33 | | 455 | aaaa | 777 | 22 |
| | | | | | |
+--------+--------+--------+------+------+------+
I need to save the Val1 and Val2 for each row in table1. So in the end, I want something like:
table3 - final
+----------+--------+--------+--------+-------------+------+------+------+
| UniqueId | Cat1Id | Cat2Id | Cat3Id | Col4 | Term | Val1 | Val2 |
+----------+--------+--------+--------+-------------+------+------+------+
| 1 | 123 | | 33 | Description | abc | 555 | 66 |
| 1 | 123 | | 33 | Description | bca | 444 | 55 |
| 2 | 11 | 22 | | Description | blah | 888 | 77 |
| 4 | 33 | | 455 | Description | dddd | 999 | 77 |
| 4 | 33 | | 455 | Description | aaaa | 777 | 22 |
| | | | | | | | |
+----------+--------+--------+--------+-------------+------+------+------+
My plan is to do this:
For each unique Cat1, 2, 3 combo in table2, query table1 for the subset of rows that contain the Cat1Id, Cat2Id, Cat3Id.
Go back to table2 and make a dictionary with the Term as key, and Val1 and Val2 as values.
Iterate over the rows from #1 above, search through each description each term. If I get a hit, then grab that row, assign Val1 and Val2, and append all this data to a new df. Repeat.
Is there a faster way of doing this? If I have 100k rows, in your experience will this go slow or ok? A couple minutes of run time is fine for me.