-2

I'm trying to split the following string

sub_text = """278
00:15:13,442 --> 00:15:14,436
Mr. Burns,

279
00:15:14,454 --> 00:15:17,893
I came here because my brother
is about to be wrongfully convicted,

280
00:15:17,947 --> 00:15:19,514
and the man I'm looking for

281
00:15:19,542 --> 00:15:21,010
would help me find the truth. 
"""

into a list that goes like this

[('00:15:13,442 --> 00:15:14,436', 'Mr. Burns,'), ('00:15:14,454 --> 00:15:17,893', 'I came here because my brother is about to be wrongfully convicted,'), ...].

I'm trying to split the text with regex but it isn't working.

re.split(r'^\d+$\n', sub_text) returns an intact string even though everything seems to match just fine here.

chocojunkie
  • 479
  • 2
  • 8
  • 14
  • The regex doesn't work because you're not using the multiline flag. But why not just do: `sub_text.split('\n\n')`? – ekhumoro Nov 22 '20 at 13:57

1 Answers1

-1

You could use regular expressions:

>>> import re
>>> re.findall(r"(\d+:\d+:\d+,\d+ --> \d+:\d+:\d+,\d+)\n(.+(?:\n.+)?)\n", sub_text)
[('00:15:13,442 --> 00:15:14,436', 'Mr. Burns,'), 
 ('00:15:14,454 --> 00:15:17,893', 'I came here because my brother\nis about to be wrongfully convicted,'), 
 ('00:15:17,947 --> 00:15:19,514', "and the man I'm looking for"), 
 ('00:15:19,542 --> 00:15:21,010', 'would help me find the truth. ')]
user2390182
  • 72,016
  • 6
  • 67
  • 89