I checked several posts related to removing duplicated words (in my case word means a sub-string separated by a space) in javascript in a String. The following one RegEx: /(\b\S+\b)(?=.*\b\1\b)/g
is among the ones I found on the internet that matches almost all cases but it produces some mismatches that I am not able to find out why. For example, it removes some characters such as: , /-
in situations where it is part of the string (not reached a blank yet). I guess it has to be with the word boundary metacharacter \b
but I am not able to find a solution for that.
For example, I have the following string samples:
123-1 123-2 test-1 test-1 w/e 10/04/20
Company w/e 09/06/20 083020-090620
a/b 01/01
test_1 test_2
a/b a/b
Inv 50049 50049 Inv 50195 PrjPAN02
Inv 51360-1, 51366-7; 51372 Inv 51360-1, 51366-7; 372 PrjPAN02
Inv 51360-1, 51366-7; 51372 51372 Inv 513601, 51366-7; 372 PrjPAN02
55009, 55017, 55022 55001, 55022, 55025
55254, 61 55246,66,69
55733, 41, 44 55727, 45,48
57269, 71,74,75, 57354 57266, 73
57437, 38, 41, 43 57434, 40
w/e 09/20/20 091320-092020
and it generates the following output. You can test it here: Regex101
1232 test-1 we 1004/20
Company we 0906/20 083020-090620
ab /01
test_1 test_2
a/b
50049 Inv 50195 PrjPAN02
, ; 51372 Inv 513601, 51366-7; 372 PrjPAN02
513601, ; 51372 Inv 513601, 51366-7; 372 PrjPAN02
55009, 55017, 55001, 55022, 55025
55254, 61 5524666,69
55733, 41, 44 55727, 45,48
57269, 7174,75, 57354 57266, 73
57437, 38, 41, 43 57434, 40
we 09/20 091320-092020
I would expect the following output:
123-1 123-2 test-1 w/e 10/04/20
Company w/e 09/06/20 083020-090620
a/b 01/01
test_1 test_2
a/b
50049 Inv 50195 PrjPAN02
51372 Inv 51360-1, 51366-7; 372 PrjPAN02
51360-1, 51372 Inv 513601, 51366-7; 372 PrjPAN02
55009, 55017, 55022 55001, 55022, 55025
55254, 61 55246,66,69
55733, 41, 44 55727, 45,48
57269, 71,74,75, 57354 57266, 73
57437, 38, 41, 43 57434, 40
w/e 09/20/20 091320-092020
I would expect that every repeated string delimited by space would be removed, but the ReEx removes the slash (/
) and hyphen (-
) and comma (,
) in some cases inside strings that are delimited by space.
I checked the following similar question, to try to find regular expressions that would match all the cases: