Edit: SQL doesn't work for this. I just found out about Solr/Sphinx and it seems like the right tool for this problem, so if you know Solr or Sphinx I'm eager to hear from you.
Basically, I have a .tsv with patent info and a .csv with product names. I need to match each row of the patents column against the product names and extract the occurrences in a new .csv column.
You can scroll down and see the example at the end.
Original question:
SQL newbie here so bear with me :). I can't figure out how to do this:
My database:
mysql> SHOW TABLES;
+-----------------------+
| Tables_in_prodpatdb |
+-----------------------+
| assignee |
| patents |
| patent_info |
| products |
+-----------------------+
mysql> DESCRIBE patents;
+-------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| ... | | | | | |
| patent_id | varchar(20) | YES | | NULL | |
| text | text | YES | | NULL | |
| ... | | | | | |
+-------------+-------------+------+-----+---------+-------+
mysql> DESCRIBE products;
+-------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| name | text | YES | | NULL | |
+-------------+-------------+------+-----+---------+-------+
I have to work with the columns name
and text
, they look like this:
name
product1
product2
product3
...
~10M rows
text
long text description 1
long text description 2
long text description 3
...
~88M rows
I need to check patents.text
row 1 and match it against products.name
column to find every product name in that row, then store those products names in a new table. Then check row 2 and repeat.
If a patents.text
row has a product name several times only copy it to the new table once. If some row has no product names just skip it. The output should be something like this:
Operation Product
1 prod5, prod6
2 prod7
...
An example:
name
valve
a/c fan
farmed salmon
...
text
This patent deals with a new approach to air-conditioned fan. With some new valve the a/c fan is
so much better. The new valve is great.
This patent has no product names in it.
This patent talks about farmed salmon.
...
Desired output:
Operation Product
1 valve, a/c fan
2 farmed salmon
...