1

I read in information from a pandas dataframe. The column "keywords" can but doesn't have to contain comma-seperated keywords for which I later on want to search for in a text. This part is easy if I only have one list of keywords over which I iterate and then look for in the text. However, I need a list for every row. How do I do that?

The input is the following Dataframe (df):

Search  keywords
 1      Smurf, gummybear, Echo
 2      Blue, yellow, red
 3      Apple, Orange, Pear

l_search = df['search'].tolist()
l_kw = df['keywords'].tolist()

Now I have a list of lists of keywords. I want to split that up into as many lists as I have searches, basically:

i = 1
for s in l_search:
   l_kw_i = [] # here the list would be l_kw_1, then l_kw_2, ...
   l_kw_i.append(s)
   i = i+1
# l_kw_1 would be now "Smurf, gummybear, Echo".

After that I would like to split each list at the commas, so l_kw_1 would now contain "Smurf", "gummybear", "Echo". I would then interate over the results of each search and the respective list to determine if at least one keyword appears.

The main problem is to create a variable amount of lists of keywords based on how many searches there are.

user9092346
  • 292
  • 2
  • 11
  • Use a dict to store the list for the row.... You can even `defaultdict` so that the list is always initialized – tehhowch Jul 17 '19 at 13:51
  • 1
    Possible duplicate of [Changing variable names with Python for loops](https://stackoverflow.com/questions/1060090/changing-variable-names-with-python-for-loops) – Vikramaditya Gaonkar Jul 17 '19 at 13:55
  • Possible duplicate of [How do I create a variable number of variables?](https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables) – Akaisteph7 Jul 17 '19 at 14:01
  • Can you add the example of desired output? – zipa Jul 17 '19 at 14:11

1 Answers1

0

The trick is to use a dictionary. You can do it in one line using a dictionary comprehension combined with a list comprehension :

df = pd.DataFrame({'Search':[1,2,3], 
                   'keywords' : ["Smurf, gummybear, Echo", "Blue, yellow, red", "Apple, Orange, Pear"] })

l_kw = {i:[y for y in x['keywords'].split(',')] for i, x in df.iterrows()}

Output :

{0: ['Smurf', ' gummybear', ' Echo'],
 1: ['Blue', ' yellow', ' red'],
 2: ['Apple', ' Orange', ' Pear']}
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
vlemaistre
  • 3,301
  • 13
  • 30