I have a dictionary with placeholders and their possible list of values, as shown below:
{
"~GPE~": ['UK', 'USA'],
"~PERSON~": ['John Davies', 'Tom Banton', 'Joe Morgan'],
# and so on ...
}
I want to create all possible combinations of strings by replacing the placeholders (i.e. ~GPE~
and ~PERSON~
) from the template:
"My name is ~PERSON~. I travel to ~GPE~ with ~PERSON~ every year".
Expected output is:
"My name is John Davies. I travel to UK with Tom Banton every year."
"My name is John Davies. I travel to UK with Joe Morgan every year."
"My name is John Davies. I travel to USA with Tom Banton every year."
"My name is John Davies. I travel to USA with Joe Morgan every year."
"My name is Tom Banton. I travel to UK with John Davies every year."
"My name is Tom Banton. I travel to UK with Joe Morgan every year."
"My name is Tom Banton. I travel to USA with John Davies every year."
"My name is Tom Banton. I travel to USA with Joe Morgan every year."
"My name is Joe Morgan. I travel to UK with Tom Banton every year."
"My name is Joe Morgan. I travel to UK with John Davies every year."
"My name is Joe Morgan. I travel to USA with Tom Banton every year."
"My name is Joe Morgan. I travel to USA with John Davies every year."
Also notice how the values corresponding to a key in the dictionary do not repeat in the same sentence. e.g. I do not want: "My name is Joe Morgan. I travel to USA with Joe Morgan every year." (so not exactly cartesian product, but close enough)
I am new to python and experimenting with the re module, but could not find a solution to this problem.
EDIT
The main problem I am facing is replacing string causes the length to change, which makes subsequent modifications to the string difficult. This is especially due to possibility of multiple instances of same placeholder in the string. Below is a snippet to elaborate more:
label_dict = {
"~GPE~": ['UK', 'USA'],
"~PERSON~": ['John Davies', 'Tom Banton', 'Joe Morgan']
}
template = "My name is ~PERSON~. I travel to ~GPE~ with ~PERSON~ every year."
for label in label_dict.keys():
modified_string = template
offset = 0
for match in re.finditer(r'{}'.format(label), template):
for label_text in label_dict.get(label, []):
start, end = match.start() + offset, match.end() + offset
offset += (len(label_text) - (end - start))
# print ("Match was found at {start}-{end}: {match}".format(start = start, end = end, match = match.group()))
modified_string = modified_string[: start] + label_text + modified_string[end: ]
print(modified_string)
Gives the incorrect output as:
My name is ~PERSON~. I travel to UK with ~PERSON~ every year.
My name is ~PERSON~. I travel USA with ~PERSON~ every year.
My name is John Davies. I travel to ~GPE~ with ~PERSON~ every year.
My name is JohTom Banton. I travel to ~GPE~ with ~PERSON~ every year.
My name is JohToJoe Morgan. I travel to ~GPE~ with ~PERSON~ every year.
My name is JohToJoe Morgan. I travel to ~GPE~ with John Davies every year.
My name is JohToJoe Morgan. I travel to ~GPE~ with JohTom Banton every year.
My name is JohToJoe Morgan. I travel to ~GPE~ with JohToJoe Morgan every year.