1

I've got a list of lists, containing strings. After much assorted regular expressions work, I've inserted what I'd like to use as a delimiter, @@@, into my strings:

[['@@@this is part one and here is part two and here is part three and heres more and heres more'],
 ['this is part one@@@and here is part two and here is part three and heres more and heres more'],
 ['this is part one and here is part two@@@and here is part three and heres more and heres more']
 ['this is part one and here is part two and here is part three@@@and heres more and heres more']
 ['this is part one and here is part two and here is part three and heres more@@@and heres more']]

Now, I need to come up with this:

[['this is part one'],['and here is part two'],['and here is part three'], ['and heres more'], ['and heres more']]  

So far my attempts are bloated, hacky, and generally ugly. I find myself splitting, combining, and matching. Can anyone recommend some general advice on this type of problem, and what tools to use to keep it manageable?

EDIT please note! and heres more indeed appears twice in the ideal output!

tumultous_rooster
  • 12,150
  • 32
  • 92
  • 149
  • Can you clarify the rule regarding when "and here's more" should be included in the output? Should it only appear in the output list once? Also, the lists embedded within your input list all be separated by commas? – Boa Mar 12 '15 at 01:45
  • Thanks for the question. There is possibly duplicated text...which must remain, and order must still be kept. – tumultous_rooster Mar 12 '15 at 01:47

1 Answers1

1

I think you actually need to grab all the characters which is just after to @@@ upto the next and or string end.

>>> [[m] for x in l for m in re.findall(r'@@@(.*?)(?=\sand\b|$)', x[0])]
[['this is part one'], ['and here is part two'], ['and here is part three'], ['and heres more'], ['and heres more']]
Avinash Raj
  • 172,303
  • 28
  • 230
  • 274