3

I want to get The first n characters of the match from this regex:

(\d+\s*)

Basically I want to right pad with spaces. So in the lines:

12345␢␢␢␢␢␢␢␢123␢␢␢␢␢␢␢
123␢␢␢␢␢␢␢␢␢12345␢␢␢␢␢␢

I want to finish with:

12345␢␢␢␢␢123␢␢␢␢␢␢␢
123␢␢␢␢␢␢␢12345␢␢␢␢␢

There are always two matches on a line and the lines have a constant length.

Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260

2 Answers2

2

Multiple passes

Based on the extra information about the problem and it's structure I would advise following steps:

  1. Split every line in two, right before the second pattern.
  2. Grab the desired part from every line.
  3. Recombine the lines so matches are on their original line.

This means something like this:

  1. Replace ^(\d*\s*)(\d*\s*)$ with $1\r\n$2. Simply drop the \r if you're not on windows, which I doubt. You should perhaps think of a macro to add at the end of a line. This should be something that is not included in the rest of the document (for instance #). The $1 means, replace the first captured group (stuff inside brackets). So replace it with $1#\r\n$2.
  2. Now grab the desired length of each line: (^.{n}).*(#?) and replace with $1$2. This will capture the first n symbols and insert the macro if it is found.
  3. Remove newlines after macros: #\r\n. Either remove these or replace them with \0.

Notes

  • You'd have to filter the lines matching (^\d*\s*) first.
  • If you'd like another macro, sub the occurrences of # in above the answer. It should not be contained in the rest of the file, at least not at the end of a line.
  • This answer uses backreferences, which should be no problem.

Single pass

A single pass might be possible too here.

^(\d[\d\s]{n-1})[^\d]*(\d[\d\s]{n-1}).*$

Matches these lines, if you extract group one and two, this will filter the desired output from the file. Simply substitute it for $1$2.

Community
  • 1
  • 1
ShellFish
  • 4,351
  • 1
  • 20
  • 33
1

Replace:

(\d[\d\s]{n-1})\s*

With:

$1

This replaces a digit followed by n-1 digits or whitespace characters, followed by any number of whitespace characters by the first n characters of what was matched (thus you should get 2 matches per line).

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138