-1

This is the text I am referring to:

'    High  4:55AM 1.3m   Low 11:35AM 0.34m   High  5:47PM 1.12m   Low 11:40PM 0.47m       First Light  5:59AM   Sunrise  6:24AM   Sunset  5:01PM   Last Light  5:27PM    '

Using Python and regex, I only want to capture: "High 4:55AM 1.3m Low 11:35AM 0.34" (which is the first part of the text, and ideally I'd like to capture it without the extra spaces).

I've tried this regex so far: .{44}

It manages to capture the group of text I want, which is the first 44 characters, but it also captures subsequent groups of 44 characters which I don't want.

Rakesh
  • 81,458
  • 17
  • 76
  • 113
Harry
  • 19
  • 1
  • 3

2 Answers2

7

If you really just want the first 44 characters, you don't need a regex: you can simply use the Python string-slice operator:

first_44_characters = s[:44]

However, a regex is much more powerful, and could account for the fact that the length of the section you're interested in might change. For example, if the time is 10AM instead of 4AM the length of that part might change (or might not, maybe that's what the space padding is for?). In that case, you can capture it with a regex like this:

>>> re.match(r'\s+(High.*?)m', s).group(1)
'High  4:55AM 1.3'

\s matches any whitespace character, + matches one or more of the preceding element, the parentheses define a group starting with High and containing a minimal-length sequence of any character, and the m after the parentheses says the group ends right before a lowercase m character.

If you want, you can also use the regex to extract the individual parts of the sequence:

>>> re.match(r'\s+(High)\s+(\d+\:\d+)(AM|PM)\s+(\d+\.\d+)m', s).groups()
('High', '4:55', 'AM', '1.3')
vgel
  • 3,225
  • 1
  • 21
  • 35
0

This regex will capture everything starting with the first "High" until the next "High" (not included), or the end of string if no next one. It gets rid of the extra spaces at beginning and end of catured group.

^\s*(High.*?)\s*(?=$|High)

if you want to reduce all multiple spaces to single ones inside the captured group, you can use a replace function by replacing this regex " +" with " " afterwards

Kaddath
  • 5,933
  • 1
  • 9
  • 23