0

I have a sample input below and I would like to extract each individual column using regex but it cant work for multiple consecutive blank spaces. I've tried "([0-9])\s+([0-9])\s+([A-Za-z0-9- ]+)\s{2,}([A-Za-z0-9- ]+)\s+([A-Za-z0-9]+)" and it should work for each row.

Output
Module    Ports Type                                   Model            Serial No.
--------- ----- ------------------------------------   ---------------  -----------
1         2     CCS-7354 Series Supervisor Module      7354-SPP         JD546546527
2         1     Standby supervisor                     Unknown          Unknown
3         28    28-port SFP+ 10GigE Linecard           7234S-PC         FGK10449938

For the first row of the input result, I should get:

  • "1" for "Output Module".
  • "2" for "Ports"
  • "CCS-7354 Series Supervisor Module" for "Type".
  • "7354-SPP" for Model.
  • "JD546546527" for "Serial No."

I'm getting "CCS-7354 Series Supervisor Module 7354-SPP " for the Type which is incorrect.

Jojoleo
  • 171
  • 2
  • 12

1 Answers1

0

Your problem is that the Type column match group [A-Za-z0-9- ]+ uses a "greedy" match.

Instead you should change it to a "reluctant" match [A-Za-z0-9- ]+?

Likewise, the Model column match group after that should also be changed to a reluctant match instead of a greedy match, so that it won't preemptively eat up all its trailing spaces.

Here is the final regex -- ([0-9])\s+([0-9])\s+([A-Za-z0-9- ]+?)\s{2,}([A-Za-z0-9- ]+?)\s+([A-Za-z0-9]+)

Test here: link

Of course there are other ways you could write the regex such that you wouldn't need to use a reluctant match syntax. For example ((?:\S|\s\S)+)

This matches non-space characters separated by at most one whitespace character.

And putting it all together, it would be: ([0-9])\s+([0-9])\s+((?:\S|\s\S)+)\s+((?:\S|\s\S)+)\s+((?:\S|\s\S)+)

Writing it this way reduces the amount of potential backtracking and should thus result in a consistently fast regex, regardless of input (although with this simple input it appears to be marginally slower).

Patrick Parker
  • 4,863
  • 4
  • 19
  • 51