0

I'm trying to regex all items from an invoice (name, unit price, total, VAT, etc.). This is what I need to regex:

1 Agrafe metalice Eco, rotunjite, 33 mm, 50 buc/cutie buc. 30.00 0,76 22,80 4,33
 
 
(SOBO604)
 
2 Banda corectoare DONAU Mouse, 5 mm x 8 m, orizontala, buc. 5.00 4,83 24,15 4,59
 
 
blister (7635001PL-99)
 
 
3 Biblioraft plastifiat OFFICE Products, 5 cm, colturi buc. 75.00 5,08 381,00 72,39
 
 
metalice, albastru (21011121-01)
 
4 Burete magnetic DONAU, 110 x 57 x 25 mm, galben buc. 10.00 5,53 55,30 10,51
 
 
(7638001PL-99)
 
 
5 Calculator de birou Canon WS-1610T, solar, 16 cifre, buc. 1.00 71,11 71,11 13,51
 
 
afisaz inclinat, format mare (WS1610T)
 
6 Capse zincate OFFICE Products 24/6, 1000 buc/cutie buc. 5.00 1,12 5,60 1,06
 
 
(18072419-19)
 
 
7 Creion grafic Eco, ascutit, cu radiera, corp verde buc. 20.00 0,40 8,00 1,52
 
 
(SOIS432)

8 Creion mecanic BIC Matic, 0.7 mm (601021) buc. 4.00 1,88 7,52 1,43

9 Dosar din plastic cu sina si doua perforatii OFFICE buc. 250.00 0,35 87,50 16,63

Products, albastru (21104211-01)

10 Dosar din plastic cu sina si doua perforatii OFFICE buc. 100.00 0,35 35,00 6,65

Products, roz (21104211-13)

pagina 1 / 3

 797638

             
11 Folie protectie OFFICE Products, A4, coaja portocala, 40 buc. 5.00 6,53 32,65 6,20
 
 
microni, 100 file/set (21141215-90)
 
 
12 Folie protectie OFFICE Products, A4, coaja portocala, 40 buc. 20.00 6,51 130,20 24,74
 
 
microni, 100 file/set (21141215-90)
 
13 Marker whiteboard Eco, varf rotund, albastru (SOIS535A) buc. 104.00 1,33 138,32 26,28
 
 
14 Marker whiteboard Eco, varf rotund, negru (SOIS535N) buc. 2.00 1,33 2,66 0,51
 
 
15 Marker whiteboard Eco, varf rotund, rosu (SOIS535R) buc. 2.00 1,33 2,66 0,51
 
16 Notite adezive OFFICE Products,  51 x 76 mm, galben pal,  buc. 5.00 1,65 8,25 1,57
 
 
100 file (14047511-06)
 
 
17 Organizator de birou DONAU Clasic VII, 6 compartimente, buc. 2.00 30,67 61,34 11,65
 
 
155 x 105 x 101 mm, transparent (7476001-99)
 
18 Panou din pluta Bi-Office, 60 x 90 cm, rama lemn buc. 1.00 32,96 32,96 6,26
 
 
(GMC070012010)
 
 
19 Pioneze color Eco, tinte pentru pluta , 40 buc/cutie buc. 1.00 2,16 2,16 0,41
 
 
(SOBO612)
 
20 Pix fara mecanism Eco, varf de 1 mm, albastru (SOIS405A) buc. 110.00 0,33 36,30 6,90
 
 
21 Plic C4 (229 x 324 mm), alb, siliconic, 10/set buc. 2.00 2,15 4,30 0,82
 
 
(15223619-14)
 
 
22 Tus pentru stampila Pelikan, cu picurator, 28 ml, negru buc. 1.00 6,93 6,93 1,32
 
(351197) 

Somebody helped me and solved the problem with

\d{1,2}(.*)(\d+\.\d+\s+)(\d+\,\d+\s?){3}([\n ]+[^(\n]*\([^)]+\)(?=\n))?

Problem is that it works on regex101.com but that's not a NET REGEX. On regexstorm.net/tester which is a NET REGEX, last part doesn't work.

Need some help to understand what is the problem. Thanks.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
RKTM
  • 11
  • 5
  • `\d{1,2}(.*)(\d+\.\d+\s+)(\d+\,\d+\s?){3}([\n ]+[^(\n]*\([^)]+\)(?=\n))?` is totally compliant with .NET regex in the meaning it won't throw any invalid syntax error. You just test against a different string. Also, `\d` is Unicode aware in .NET by default, and matches all digits even like `٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙0123456789` – Wiktor Stribiżew Nov 21 '20 at 13:28

1 Answers1

1

You can change the positive lookahead at the end of the pattern to match either an optional carriage return followed by a newline, or assert the end of the string to also get the last item (?:\r?\n|$)

You can change [\n ]+ to \s+ to also match a carriage return.

\d{1,2}(.*)(\d+\.\d+)\s+(\d+\,\d+\s?){3}(\s+[^(\r\n]*\([^)]+\)(?:\r?\n|$))?

.NET regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70