Extracting multiple string values of variable length before and after a delimiter in a list

Question

I have several Python lists in the following format:

rating = ['What is your rating for?: Bob', 'What is your rating for?: Alice', 'What is your rating for?: Mary Jane']

opinion = ['What is your opinion of?: Bob', 'What is your opinion of?: Alice', 'What is your opinion of?: Mary Jane']

I am trying to write a function that will evaluate a given list and generate two data structures from it:

a list of the names that appear after the colons (:)
a string variable that has the text that is repeated before the colons (:)

Ideally, both items would be named based off of the original list name. Also, the delimiter and the first space after it should be ignored.

Desired sample output for the two above examples:

rating_names = ['Bob', 'Alice', 'Mary Jane']
rating_text = 'What is your rating for?'

opinion_names = ['Bob', 'Alice', 'Mary Jane']
opinion_text = 'What is your opinion of?'

I've been able to make this work for a single list by removing a fixed string from each list item, but haven't quite figured out how to make it work for a variable number of characters before the delimiter and the potential of a two word name (e.g. 'Mary Jane') after it.

rating_names = ([s.replace('What is your rating for?': ','') for s in rating])

After searching, it appears that a regular expression like look-ahead (1, 2) might be the solution, but I can't get that to work, either.

Elazar · Accepted Answer · 2013-05-16T09:24:58.280

use str.split():

>>> 'What is your rating for?: Bob'.split(': ')
['What is your rating for?', 'Bob']

to get the text and names:

>>> def get_text_name(arg):
...     temp = [x.split(': ') for x in arg]
...     return temp[0][0], [t[1] for t in temp]
...
>>> rating_text, rating_names = get_text_name(rating)
>>> rating_text
'What is your rating for?'
>>> rating_names
['Bob', 'Alice', 'Mary Jane']

to get "variables" (you probably mean "dict", as have been said here):

>>> def get_text_name(arg):
...     temp = [x.split(': ') for x in arg]
...     return temp[0][0].split()[-2], [t[1] for t in temp]
... 
>>> text_to_name=dict([get_text_name(x) for x in [rating, opinion]])
>>> text_to_name
{'rating': ['Bob', 'Alice', 'Mary Jane'], 'opinion': ['Bob', 'Alice', 'Mary Jane']}

Thanks Elazar. Any suggestions on dynamically generating the _text and _names variables based on the input for the function? — Daniel Romero, May 16 '13 at 02:22

score 1 · Answer 2 · answered May 16 '13 at 02:12

1

import re
def gr(l):
    dq, ds = dict(), dict()
    for t in l:
        for q,s in re.findall("(.*\?)\s*:\s*(.*)$", t): dq[q] = ds[s] = 1 
    return dq.keys(), ds.keys()

l = [ gr(rating), gr(opinion) ]
print l

answered May 16 '13 at 02:12

perreal

94,503
21
155
181

score 0 · Answer 3 · answered May 16 '13 at 03:50

If you have a large number lists to process you may consider putting the data directly into a dictionary. This might help address you question to Elazar.

Code

def dict_gen(d, l):
    for s in l:
        question, name = s.split(': ')
        if question not in d:
            d[question] = []    
        d[question].append(name)

Usage

rating = ['What is your rating for?: Bob', 'What is your rating for?: Alice', 'What is your rating for?: Mary Jane']
opinion = ['What is your opinion of?: Bob', 'What is your opinion of?: Alice', 'What is your opinion of?: Mary Jane']

results = {}
dict_gen(results, rating)
dict_gen(results, opinion)

for key, value in results.items():
    print key, value

Yields

What is your rating for? ['Bob', 'Alice', 'Mary Jane']
What is your opinion of? ['Bob', 'Alice', 'Mary Jane']

Extracting multiple string values of variable length before and after a delimiter in a list

3 Answers3