0

I'm trying to combine two strings in a list, where the combination of the terms have a different meaning compared to when they are tokenized individually.

An example of this would be:

['I', 'want','to','join','a','football','team','in','2022']

The goal is to join the strings 'football' and 'team', with a _ if the two terms occur after one another, resulting in this string football_team.

The final list would looks like this:

['I', 'want','to','join','a','football_team','in','2022']

Any help is appreciated as I only get to the point where I can join the whole list.

EDIT:

I have been trying to join terms using this: Is there a way to combine two tokens from a list of tokens?

Also I have tried this, but it joins every element in the list: How to concatenate items in a list to a single string?

EDIT 2:

In answer to a question in the comments, "How would one understand which words would hold some meaning?"

I have a pre-defined list of string combinations that need to be merged.

msa
  • 693
  • 6
  • 21
  • 3
    How would one understand which words would hold some meaning? – Devang Sanghani Feb 25 '22 at 11:04
  • *"I'm trying to"*: can you include what you have been trying, so we see where you are stuck? – trincot Feb 25 '22 at 11:04
  • 1
    You could keep track of the words that can be merged with a dictionary of type Dict[str, Set[str]]. In your for loop you could look at the next word and if it's in the Set part of your dictionary you could merge those two words. – Christian Weiss Feb 25 '22 at 11:07
  • 1
    But be aware if you use a for loop, it can cause a little error if you remove one of the words after merging. I would use a while loop. – Christian Weiss Feb 25 '22 at 11:10
  • I think you want to search for some specific words from a list of strings and then join them in a single string, so you can use any one of the ML libraries. – RaviPatidar Feb 25 '22 at 13:14

2 Answers2

2
" ".join(['I', 'want','to','join','a','football','team','in','2022']).replace("football team","football_team").split(" ")
Dharman
  • 30,962
  • 25
  • 85
  • 135
Kris
  • 8,680
  • 4
  • 39
  • 67
1

By this way you can define an arbitrary set of pairs.

pairs = {"football" : "team"}

a_list = ['I', 'want','to','join','a','football','team','in','2022']
list_copy = a_list.copy()

for one, another in zip(list_copy.copy()[:-1], list_copy.copy()[1:]):
    if one in pairs and another == pairs[one]:
        i = a_list.index(one)
        del a_list[i:i+2]
        a_list.insert(i, one + "_" + another)
Vovin
  • 720
  • 4
  • 16