Convert a set of tuples into values

Question

I'm working on an nlp project in which I need to parse tags. I have multiple tags in the following form: a string that is a set of tuples. Example:

'{(Entertainment (Adult), S), (Performing Arts, S), (Comedy Club, S), ($, S), (Comedy, P), (18+, S), (Plays & Shows, P)}'

But I want it to look like this:

{('Entertainment (Adult)', 'S'), ('Performing Arts', 'S'), ('Comedy Club', 'S'), ('$', 'S'), ('Comedy', 'P'), ('18+', 'S'), ('Plays & Shows', 'P')}

I tried using literal_eval per this question, but I get an invalid syntax error. I think this is because the tag is a set, which contains tuples, which contain strings that are not cast as strings, so the literal_eval gets confused (just guessing here).

I tried doing some bandaid-y string strips and splits, but I can't get a solution that will work dynamically for different tags.

what if tags contain commas, parentheses? wouldn't it be simpler to generate the list properly in the first place? — Jean-François Fabre, Oct 17 '17 at 14:32
tags will always be in the same form: a set of tuples containing two values. Another tag example would be '{(All Ages, S), ($, S), (Alternative & Rock, S), (Concerts & Live Music, P)}' In [ ]: — Daniel, Oct 17 '17 at 14:33
I would try first spliting by commas and then joining pairs. — Adirio, Oct 17 '17 at 14:36

Ajax1234 · Accepted Answer · 2017-10-17T15:04:55.347

2

You can use regular expressions:

import re
s = '{(Entertainment (Adult), S), (Performing Arts, S), (Comedy Club, S), ($, S), (Comedy, P), (18+, S), (Plays & Shows, P)}'
final_data = [re.split(",\s+", i) for i in re.findall("\((.*?)\)", s)]
final_data = [[re.sub("\(|\)", '', b) for b in i] for i in final_data]
new_final_data = set(map(tuple, final_data))

Output:

set([('Entertainment (Adult)', 'S'), ('Performing Arts', 'S'), ('Comedy Club', 'S'), ('$', 'S'), ('Comedy', 'P'), ('18+', 'S')])

edited Oct 17 '17 at 15:04

answered Oct 17 '17 at 14:43

Ajax1234

69,937
8
61
102

This works, but is returning nothing for strings with a single tag. For example, '{(Concerts & Live Music, P)}' – Daniel Oct 17 '17 at 14:59

score 1 · Answer 2 · answered Oct 17 '17 at 14:46

I would do it this way:

original = '{(Entertainment (Adult), S), (Performing Arts, S), (Comedy Club, S), ($, S), (Comedy, P), (18+, S), (Plays & Shows, P)}'

splited = original[1:-1].split(',')

splited = list(map(lambda x: x.strip(), splited))

grouped = []

for i in range(0, len(splited), 2):
    grouped.append((splited[i][1:], splited[i+1][:-1]))

print(grouped)

Use the variable names you prefer. I first use [1:-1] to delete the first and last chars ({ & }) and then split by commas. I then .strip() every part to delete initial and final white-spaces. Last I iterate over the list with an step of 2 and delete the first char of odd elements (() and the last char of even elements ()). I append the resulting tuple into a new list.

Convert a set of tuples into values

2 Answers2