Let's say I have a list called split_on_these
which I'd like to use split another list, text
on. I first pad split_on_these
so as to not remove naturally occurring instances of split_on_these
entries:
split_on_these = ['iv', 'x', 'v']
text = ["random iv text x hat v", "cat", "dog iv", "random cat x"]
padding = [" " + i + " " for i in split_on_these]
I'm trying to create new_text
that splits on all the items contained in padding
like so:
["random", "text", "hat", "cat", "dog", "random cat"]
I tried replacing all the entries of text that are contained in padding
with some character like ~
and then splitting on that character, but the issue is that when you iterate over the entries in text, sometimes it will be word chunks, and other times it will be individual letters.
Please note that entire chunks preceding a delimiter should be preserved (e.g. random cat).