I was trying to solve one problem for a while now but without success. In start it looks like a trivial issue but I have stacked with it...
Anyhow, I need to solve following problem. I have very large CSV file with lines in following format:
NUMBER(9);NUMBER(1);NUMBER(9-10);NUMBER(2);NUMBER(1);...;NUMBER(2);NUMBER(1);STRING;DATE(DD.MM.YYYY);NUMBER(1351)
for example:
517755369;1;0001303717;48;1;63;8;50;2;51;6;53;7;55;3;57;4;59;5;;;;;CALL;07.12.2012;1351
In each line after first tree fields I have 1 to 10 pairs NUMBER(2);NUMBER(1)
, followed by another three fields STRING;DATE(DD.MM.YYYY);NUMBER(1351)
.
I need to transform that file in file with following structure:
517755369;1;0001303717;48;1;CALL;07.12.2012;1351
517755369;1;0001303717;63;8;CALL;07.12.2012;1351
517755369;1;0001303717;50;2;CALL;07.12.2012;1351
517755369;1;0001303717;51;6;CALL;07.12.2012;1351
517755369;1;0001303717;53;7;CALL;07.12.2012;1351
517755369;1;0001303717;55;3;CALL;07.12.2012;1351
517755369;1;0001303717;57;4;CALL;07.12.2012;1351
517755369;1;0001303717;59;5;CALL;07.12.2012;1351`
So each line from input file should be transformed to as many lines as original line has NUMBER(2);NUMBER(1)
pairs.
Here is a sample of input file:
517760344;2;000601301061;31;1;;;;;;;;;;;;;;;;;;;CALL;07.12.2012;1351
518855369;1;000601303717;48;1;63;8;50;2;51;6;53;7;55;3;57;4;59;5;;;;;CALL;07.12.2012;1351
519775067;1;000601300771;4;2;6;3;19;1;;;;;;;;;;;;;;;CALL;07.12.2012;1351
617773407;1;000603252922;13;1;17;2;27;3;;;;;;;;;;;;;;;CALL;07.12.2012;1351
717764779;1;000601304021;31;1;;;;;;;;;;;;;;;;;;;CALL;07.12.2012;1351`
In general I need some regexp that I can use with sed or awk (or some perl script I can run against input file). The original input file has roughly 1–1.5M records. This task should be finished as quickly as possible (up to 5 minutes for transformation).
Thanks