0

I have a text document here: http://regexr.com/3d7t5

Using Javascript (nodeJS), I need to match the three digit number in each of the phrases that start with C.O. : i.e., 001, 003, 036, etc

I have tried using non-capturing groups but for some reason my query is not working:

/([0-9]+)(?:C.O.            : \d\d\d)?/g
quelquecosa
  • 890
  • 1
  • 10
  • 24

2 Answers2

1

Use this pattern:

/\bC\.O\.            : (\d\d\d)\b/g

Then perform an exec on the string and take the content of the 1st capture group.

var rx = /\bC\.O\.            : (\d\d\d)\b/g
var s = "C.O.            : 001 CALI\nC.O.            : 003 MIAMI\nC.O.            : 026 TEXAS";
for (var m = rx.exec(s); m; m = rx.exec(s)) {
   document.write(m[1]+' ');
}

Edit: If the number and kind of whitespace may vary, you can of course adjust the regex to handle this as well:

/\bC\.O\.\s*:\s*(\d\d\d)\b/g
Lucero
  • 59,176
  • 9
  • 122
  • 152
  • 1
    Why not replace the whitespace with `\s*`, and digits with `\d{3}`? – James Buck Apr 16 '16 at 22:33
  • @JamesBuck Because I assumed that the phrase must exactly match the pattern, and not any text beginning with `C.O.` followed by a colon and a 3-digit number. – Lucero Apr 16 '16 at 22:34
  • '*I assumed that the phrase must exactly match the pattern*' > Then I still would've used something like `\s{12}` because the whitespace is hard to see/modify/quantify in your regex; I would've gone for something like `/^C\.O\.\s{12}:\s(\d{3})/g` – RobIII Apr 16 '16 at 22:38
  • @RobIII Fair enough - copy-paste was simpler than counting the spaces. ;) Note however that `\s` matches any kind of whitespace, including tabs and newlines, so it is not equivalent at all. I've added an example with `\s` nevertheless. – Lucero Apr 16 '16 at 22:41
  • '*Note however that \s matches any kind of whitespace*' Yeah, quelquecosa isn't exactly clear on that and what can and what cannot be expected in the string. I think it's safe to assume `\s` works as he desires; if not then he should be more clear. – RobIII Apr 16 '16 at 22:49
  • @quelquecosa Well if you try my snippet you'll see that it matches three consecutive `C.O. :` lines, is that not what you want? If not, please be more specific about the requirement. – Lucero Apr 16 '16 at 22:55
0

If you only want to match 3 digits then

/\d{3}/gm

is all you need. But I think you require something like:

/^C\.O\..*?(\d{3})/gm

or

/^C\.O\..*?:\s*(\d{3})/gm

You can play and tinker with the above regexes here and here.


As it turns out quelquecosa left out one-or-two "minor details".

The regex should probably be something like:

/^\|\s+C\.O\.\s+:\s+(\d{3})/gm

Example here.

This matches 009 and 011 in the below text, but not Total C.O. ....

+---------------------------------------------------------------------------------------------------------------------------------------+
| UNO - VER 8.5.                                                                         HORA  :    5:56 PM |
|                                                                                                                    PAGINA:         14 |
|                                                                                                                                       |
| Empresa         : MA                                                                                                  |
| C.O.            : 009 PALMIRA2 OUTLET                                                 Fecha Inicial : 2016/04/16  Hora Inicial:       |
| Tipo Inventario : 6   ETIQUETAS Y BOLSAS                                              Fecha Final   : 2016/04/16  Hora Final  :       |
+---------------------------------------------------------------------------------------------------------------------------------------+
|GRUPO              DESCRIPCION                           U.M.      CANTIDAD         BRUTO    DESCUENTOS     IMPUESTOS     T O T A L    |
+---------------------------------------------------------------------------------------------------------------------------------------+
01                  CLASIFICACION DANE                    UNI         45.000 ** OBSEQUIO **
-----------------------------------------------------------------------------------------------------------------------------------------
Total Inventario    ETIQUETAS Y BOLSAS                                45.000             0             0             0             0
Total Inventario    ETIQUETAS Y BOLSAS                                45.000             0             0             0             0
-----------------------------------------------------------------------------------------------------------------------------------------
Total C.O.          PALMIRA2 OUTLET                                              1,001,346             0       160,254     1,161,600
Total C.O.          PALMIRA2 OUTLET                                              1,001,346             0       160,254     1,161,600
+---------------------------------------------------------------------------------------------------------------------------------------+
| UNO - VER 8.5.                                                                         HORA  :    5:56 PM |
|                                                                                                                    PAGINA:         15 |
|                                                                                                                                       |
| Empresa         : MA                                                                                                  |
| C.O.            : 011 CARTAGO                                                         Fecha Inicial : 2016/04/16  Hora Inicial:       |
| Tipo Inventario : 3   PRODUCTO TERMINADO                                              Fecha Final   : 2016/04/16  Hora Final  :       |
+---------------------------------------------------------------------------------------------------------------------------------------+
|GRUPO              DESCRIPCION                           U.M.      CANTIDAD         BRUTO    DESCUENTOS     IMPUESTOS     T O T A L    |
+---------------------------------------------------------------------------------------------------------------------------------------+
01                  CLASIFICACION DANE                    UNI         26.000       853,537       225,943       100,415       728,009
-----------------------------------------------------------------------------------------------------------------------------------------
Total Inventario    PRODUCTO TERMINADO                                26.000       853,537       225,943       100,415       728,009
Total Inventario    PRODUCTO TERMINADO                                26.000       853,537       225,943       100,415       728,009
Community
  • 1
  • 1
RobIII
  • 8,488
  • 2
  • 43
  • 93
  • You are correct about the non-capturing group; I updated that. The rest is all up to quelquecosa; the question / desired behaviour isn't exactly very clear. – RobIII Apr 16 '16 at 22:46
  • Right... `C.O.X123` will still be matched though. – Lucero Apr 16 '16 at 22:47
  • @RobIII, I need only the 3-digit number, not the whole phrase – quelquecosa Apr 16 '16 at 22:49
  • 1
    @quelquecosa, Did you even look at the regexes I posted? [The first one](http://www.regexr.com/3d7t2) matches only numbers (what you're asking for but pretty sure not what you intend to ask for; I'm pretty sure that's not what you want). The other 2 I'm pretty sure they *are* (close to) what you need; both return a single group with the desired result. The entire line may "light up babyblue" but you need to hover over the highlighted areas to see the matchgroup(s) etc. Long story short: you need to be **way more specific** about your exact requirements instead of letting us guess the details. – RobIII Apr 16 '16 at 22:52
  • I did, I looked at everything you posted, including links. It didn't work for me. Here is the actual file on regxr: http://regexr.com/3d7t5 – quelquecosa Apr 16 '16 at 23:00
  • @quelquecosa Next time, don't change the rules mid-game. The text you posted on regexr is nothing like the text you posted in your question. No wonder it doesn't work... There's not even "three digits" but four 'columns' of monetary amounts like `1,838,287`. – RobIII Apr 16 '16 at 23:04
  • Sorry. I thought this was about helping me, not getting more points on stack. There are three digits after C.O. . Look closely. thanks bud. – quelquecosa Apr 16 '16 at 23:07
  • About the 3 digits: I was looking at "Total C.O."; my bad on that one. But the line also starts with `| ` (not in your original question). Have a look at [this example](http://regexr.com/3d7tb). '*I thought this was about helping me, not getting more points on stack*' You have to *help us* help you. By leaving out critical information we can't help you. – RobIII Apr 16 '16 at 23:11