For a list of strings that exist as rows within a table, I want to identify the frequency of those strings within rows of another data table in R. Simultaneously, I want to sum the values of rows that contain these strings.
For example, my reference table which contains a list of strings would look something like this:
+-----------------------------+
|String |
+-----------------------------+
|Dixon |
+-----------------------------+
|Nina Kraviz |
+-----------------------------+
|DJ Tennis |
+-----------------------------+
And my table that I want to analyze would look something like this:
+--------------------------------+
|String |Score |
+--------------------------------+
|Nina Kraviz @ Hyde |100 |
+--------------------------------+
|DJ Tennis? |200 |
+--------------------------------+
|From Dixon |100 |
+--------------------------------+
|From Kevin Saunderson |100 |
+--------------------------------+
|Dixon |300 |
+--------------------------------+
|Nina Kraviz |200 |
+--------------------------------+
I want my resulting table to look like this:
+---------------------------------+
|String |Score |
+---------------------------------+
|Dixon |400 |
+---------------------------------+
|Nina Kraviz |300 |
+---------------------------------+
|DJ Tennis |200 |
+---------------------------------+
I've tried using n-grams and tokenizing but it's not working in a way that makes it easy, as artists names can contain 1, 2 or 3 words commonly. Any help would be appreciated.