1

I've a collection of strings like that (each "space" is a tabulation):

29  301 3   31  0       TREZILIDE       Trézilidé
2A  001 1   73  1   (LE)    AFA (Le)    Afa

What I want is to transform it into this:

29301 Trézilidé
2A001 (Le) Afa
  • Suppression of the first tabulation
  • suppression of the tabulations, numbers and the first uppercase occurrence (and replacement of the whole stuff by a space)
  • replacement of the last tabulation by a space

My bigger problems are:

  • How to select the first tabulation without selecting the "prefix" and the "suffix"? (like ^(..)\t[0-9] but without selecting ^(..) nor [0-9])
  • How to select from after the 3 digits to after the tabulation of the uppercase word?

I do that in a text file with the search and replace toolbox of Notepad++

Thanks in advance for your help!

BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
Pascal Qyy
  • 4,442
  • 4
  • 31
  • 46

2 Answers2

6

How to select the first tabulation without selecting the "prefix" and the "suffix"?

Optimally this is done using lookahead and lookbehind assertions, but Notepad++ doesn't support those before version 6.0. The next best solution is to just capture them, then backreference them in the replacement string.

Here's how I did it (in answer to your full question):

  1. Check Match case to do a case-sensitive find

  2. Find by regex:

    ^(..)\t(\d\d\d)[\tA-Z0-9()]+\t(.+)$
    

    Replace with:

    \1\2 \3
    

    I end up with this, where <tab> represents an actual tabulation:

    29301 Trézilidé
    2A001 (Le)<tab>Afa
    
  3. To get rid of that I do an extended find:

    \t
    

    And replace it with the space character, to obtain the final result:

    29301 Trézilidé
    2A001 (Le) Afa
    
BoltClock
  • 700,868
  • 160
  • 1,392
  • 1,356
  • +1 I was running an older version of notepad++ and was wondering why I couldn't use assertions. Thankfully this post was one of the first that came up! – Malachi Apr 27 '12 at 09:09
  • @Malachi: I haven't been able to come up with a solution to this particular question using assertions. But it's always good to know it finally supports them now :) – BoltClock Apr 27 '12 at 09:18
  • @BoltClock'saUnicorn: It looks like lookbehinds are not supported :( Only forward assertions. – Malachi Apr 27 '12 at 10:10
  • @Malachi: Interesting. Maybe more complex ones aren't supported. I can match `abc` in `; abc` using `(?<=;\s).*` but not when I use `(?<=;\s*).*` – BoltClock Apr 27 '12 at 10:12
  • @BoltClock'saUnicorn: I think it could be. I'm no regular expression expert but `(>"(?!\s+)).+((?<!\s+)"<)` This works in eclipse but not in Notepad++ – Malachi Apr 27 '12 at 10:51
1

Try

^(..)\t

Replace with

\1

Then

\(*[A-Z][A-Z]+\)*

Replace with empty string, removes (LE) and AFA too.

''

Then

^(.....).*(\t[A-Za-z]+)+$

Replacement:

\1 \2

And finally:

\t

Replace with a space. Every occurence.

HTW

Zsolt Botykai
  • 50,406
  • 14
  • 85
  • 110