I am trying for about 2 hours, and I'm not sure whether what I want to do even works.
I have a large file with some data that looks like
43034452 LONGSHIRTPAIETTE 17.30
27.90
0110
COLOR : : : : :
: : :
-11 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
43034453 LONG SHIRT PAI ETTE 16.40
25.90
0110
COLOR : : : : :
: : :
-3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
43034454 BASIC 4.99
8.90
0110
COLOR : : : : :
: : :
-5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
(The file has 36k rows.)
What I want to do is to get this whole thing clean.
In the end, the rows should look like
43034452;LONGSHIRTPAIETTE;17.30;27.90;0110
43034453;LONG SHIRT PAI ETTE;16.40;25.90;0110
43034454;BASIC;4.99;8.90;0110
So there is a lot of data that I don't need. I'm using Notepad++ to do my regex.
My regex string looks like ([0-9]*)\s{6,}([A-Z]*)\s*([0-9\.]*)\s*([0-9\.]*)\s*([0-9]*)
at the moment.
This brings me the first number followed by 6 spaces. (It has to be like this because some rows start with FF
and FF
are not letters. It's some kind of sign that I can't identify but if I let Notepad++ show all signs I see FF
.)
So as a result I get
\1: 43034452
\2: LONGSHIRTPAIETTE
\3: 17.30
\4: 27.90
\5: 0110
like expected, but on the next row it stops on the space. If I add \s
to the pattern, then it also selects all spaces after the word part. And I obviously can't say "only one space", can I?
So my question is, can I use regex to get a selection like the one I want?
If so, what am I doing wrong?