1

I have raw txt files and need to use regex to search each digit separated by space.

Question, data format is like:

   6   3   1   0
   7   3   1   0
   8   35002   0
   9   34104   0

My regex is:

(?P<COORD>\d+)

The matched output for first two lines are, (6,3,1,0) and (7,3,1,0) which are correct. However, it doesn't apply to last two lines, their output are (8, 35002, 0) and (9, 34104, 0). The correct grouping numbers should be (8, 3, 5002, 0) and (9, 3, 4104, 0). How can I solve this?

Kelvin Lo
  • 189
  • 1
  • 11
  • 3
    This is a fixed-width text, see https://stackoverflow.com/questions/4914008/how-to-efficiently-parse-fixed-width-files – Wiktor Stribiżew Nov 29 '21 at 15:50
  • 2
    [`(?P(?<= {4})|(?<= {3})\d|(?<= {2})\d{2}|(?<= )\d{3}|\d{4})`](https://regex101.com/r/1sgwM4/1) – logi-kal Nov 29 '21 at 16:37
  • @horcrux This code works. How can I rename these 4 groups of digits in different name? – Kelvin Lo Nov 30 '21 at 12:12
  • 2
    `my_regex = "".join([r" *(?P(?<= {4})|(?<= {3})\d|(?<= {2})\d{2}|(?<= )\d{3}|\d{4})" % i for i in range(1,5)])` gives you [this regex](https://regex101.com/r/AiWgZO/1) – logi-kal Nov 30 '21 at 14:09
  • @horcrux thank you! I wish I can give you the best answer if you don't mind adding an answer – Kelvin Lo Nov 30 '21 at 15:35

1 Answers1

0

If the numbers are aligned and the width of the columns are fixed, You can use

width = 4
for line in lines:
    columns = [ line[j: j + width] for j in range(0, len(line), width)]
    numbers = list(map(lambda x: int(x.strip()), columns))
    # or a one liner
    print(list(int(line[j:j+width].strip()) for j in range(0, len(line), width)))
Shanavas M
  • 1,581
  • 1
  • 17
  • 24