0

Problem - Multiline, Semi-colon delimited file has been split at column 79 or 80 (not always the same for some strange reason).

It seems to me that a Regex would be the appropriate solution, so now I have two problems.

Lines are:

1sdf.............................mno[cr][lf]
pqr........xyz......................[cr][lf]
.....|.....|.....|.....|.....|.....|[cr][lf]
2sdf.............................mno[cr][lf]
pqr........xyz......................[cr][lf]
.....|.....|.....|.....|.....|.....|[cr][lf]
3sdf.............................mno[cr][lf]
pqr........xyz......................[cr][lf]
.....|.....|.....|.....|.....|.....|[cr][lf]
4sdf.............................mno[cr][lf]
pqr........xyz......................[cr][lf]
.....|.....|.....|.....|.....|.....|[cr][lf]
... 10000 rows ...

Where the pipe is a non-space whitespace character (possibly a tab)

I need:

1sdf.............................mnopqr........xyz......................[cr][lf]
2sdf.............................mnopqr........xyz......................[cr][lf]
3sdf.............................mnopqr........xyz......................[cr][lf]
4sdf.............................mnopqr........xyz......................[cr][lf]

I managed to get the job done with

Pass 1: Replace ^\s*\r\n with \rxxx\n

// Replace Blank lines with \rxxx\n leaving

1sdf.............................mno[cr][lf]
pqr........xyz......................[cr][lf]
[cr]xxx[lf]
2sdf.............................mno[cr][lf]
pqr........xyz......................[cr][lf]

Pass 2: Replace \r\n with [empty] //leaving:

1sdf.............................mnopqr........xyz......................[cr]
xxx[lf]
2sdf.............................mnopqr........xyz......................

Pass 3: Replace \rxxx\n with \r\n

//leaving:

1sdf.............................mnopqr........xyz......................[cr][lf]
2sdf.............................mnopqr........xyz......................

And the rest of the cleanup is trivial.

Is there any way of doing this in a single step? The output is from a common financial application, and I'd rather be able to fix the files myself rather than try and get many multiple clients to adjust their output.

Chris Cudmore
  • 29,793
  • 12
  • 57
  • 94
  • does replacing `\r\n\s*((\r\n)?)` with captured group no. `1` not work? (If you tell us which engine/technology you are using I could possibly test it myself and post it as a proper answer ;)) – Martin Ender Oct 11 '12 at 21:55
  • Nope. It replaces all newlines, and leaves me with one single line. I'm playing with it in notepad++, but I can adjust flavours as required. – Chris Cudmore Oct 12 '12 at 12:39
  • I just tested it, it works with a minor caveat. Let me write an answer... – Martin Ender Oct 12 '12 at 12:52

2 Answers2

1

In Notepad++ (using regular expression mode) you can use this:

Find what: \r\n(\s*\r\n)?

Replace with: \1

Then run "Replace All" exactly once. However, make sure you update to Notepad++ 6! Otherwise matching \r\n with a regular expression won't work in Notepad++.

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130
0

Assuming that ^\s*\r\n match the line you want to remove as you said above, I believe you could do it with replacing \r\n\s*\r\n|\r\n by \r\n

It's my first regex, so if it doesn't work, don't be to harsh :-)

Good luck

Luis
  • 11,978
  • 3
  • 27
  • 35