2

assume a text file with about 40k lines of

Color LaserJet 8500, Color Laserjet 8550, Color Laserjet 8500N, Color Laserjet 8500DN, Color Laserjet 8500GN, Color Laserjet 8550N, Color Laserjet 8550DN, Color Laserjet 8550GN, Color Laserjet 8550 MFP, 

as an example

any1 able to help me with a reg-ex that can trim out all data after the numbers, but before the comma? so that 8500N becomes just 8500

end result would be

Color Laserjet 8500, Color Laserjet 8550, Color Laserjet 8500, Color Laserjet 8500, Color Laserjet 8500, Color Laserjet 8550, Color Laserjet 8550, Color Laserjet 8550, Color Laserjet 8550, 

amazing bonus kudos to anybody that can then somehow suggest the best way to remove duplicates in notepad++ (or other easily available program)

NRGdallas
  • 395
  • 1
  • 8
  • 20

3 Answers3

2

You should replace each match of (?<=\d)[^\d,]+(?=,) with empty string.

The above regex reads: "Any one or more non-digit and non-comma character(s) between digit and comma".

In case you may experience such number with trailing letter(s) at then end of string (or line) and you want that trim as well, even there is no comma behind, then use (?<=\d)[^\d,]+(?:(?=,)|$)

That reads similar, it just adds "or end of string" behind the first meaning.


Update:

Because it seems that Notepad++ does not support regex lookaround, then the solution is to replace (\d)([^\d,]+)(,) with \1\3 or (\d)[^\d,]+(,) with \1\2.

Ωmega
  • 42,614
  • 34
  • 134
  • 203
  • 0 occurances were replaced. 0 occurances were replaced for the 2nd expression as well – NRGdallas Jun 27 '12 at 17:21
  • I work with regex and have no experience with notepad++. It works fine with Perl - see http://ideone.com/f6WSE. Does notepad++ support regex lookaround? Or maybe you need to group the match - try to replace `[^\d,]` with `([^\d,])`. Or you need to add `/` at then beginning and the end of regex. – Ωmega Jun 27 '12 at 17:30
  • still nothing on this suggestion, sorry – NRGdallas Jun 27 '12 at 17:34
  • 1
    Then try to replace `(\d)([^\d,]+)(,)` with `\1\3` – Ωmega Jun 27 '12 at 17:37
0

How about this:

(.*?\d+)\D*(,)

It will match the entire thing, but you can just grab group 1 and 2. That will leave out the non-digits between the digits and commas.

The replace would be:

\1\2

Here is a SO that elaborates that this is the only way to do this.

Or, as Arithmomaniac suggests, you could do this with one group, adding the comma back in after each match

(.*?\d+)\D*,

The replace would be

\1,
Community
  • 1
  • 1
Justin Pihony
  • 66,056
  • 18
  • 147
  • 180
  • Better yet, just drop the `(.*?)` part, and don't bother to capture the comma - add it to the replace string explicitly. Also, make it \D+ - there's no work to be done if it's all numbers. – Arithmomaniac Jun 27 '12 at 17:05
  • unfortunately, I have like 0 experience with regular expressions. both the original answer and the comment here don't make much sense to me how to use them. Any way you guys can translate to layman what to put in the find field, and what in the replace? Appreciated! – NRGdallas Jun 27 '12 at 17:08
  • @Arithmomaniac Updated my answer with the suggestion. I dont see how I can drop the `?` lest it become too greedy – Justin Pihony Jun 27 '12 at 17:09
  • output with the suggestion posted is $1,$1,$1,$1,$1,$1,$1,$1, Color Laserjet 8550 MFP – NRGdallas Jun 27 '12 at 17:20
  • output for artiho's suggest is Laserjet 8, Laserjet 8500B output for original is Laserjet 8500N, Laserjet Alpha, Laserjet 8500B (it says it replaced things... nothing changed though) in response to a deleted comment to use an alternate trial of Laserjet 8500N, Laserjet Alpha, Laserjet 8500B – NRGdallas Jun 27 '12 at 17:23
  • @user1298883 Sorry, it seems it should be \1 instead of $1 – Justin Pihony Jun 27 '12 at 17:32
  • output: Color Laserjet 8, Color Laserjet 8, Color Laserjet 8, Color Laserjet 8, Color Laserjet 8, Color Laserjet 8, Color Laserjet 8, Color Laserjet 8, Color Laserjet 8, – NRGdallas Jun 27 '12 at 17:34
  • @user1298883 Add a `+` after `\d`. This `\d+` means it will match 1 or more digits. I have updated my answer. – Justin Pihony Jun 27 '12 at 17:35
0

Screenshot of regex in notepad++... Notepad++ Screenshot

dugas
  • 12,025
  • 3
  • 45
  • 51