1

This regex: \b([A-z*]+)-(?=[A-z*]+\b)

with this replacement: $1 

Applied on:

Jean-Pierre bought "blue-green-red" product-2345 and other blue-red stuff.

Gives me:

Jean Pierre bought "blue green red" product-2345 and other blue red stuff.

While I want:

Jean Pierre bought "blue-green-red" product-2345 and other blue red stuff.

https://regex101.com/r/SJzAaP/1

EDIT:

I am using Clojure (Java)

EDIT 2:

yellow-black-white -> yellow black white

product_a-b -> product_a-b

EDIT 3: Accepted answer translated in Clojure

(clojure.string/replace
 "Jean-Pierre bought \"blue-green-red\" product-2345 and other blue-red-green stuff yellow-black-white product_a-b"
 #"(\"[^\"]*\")|\b([a-zA-Z]+)-(?=[a-zA-Z]+\b)"
 (fn [[s1 s2 s3]] (if s2 s1 (str s3 " "))))

;;=> "Jean Pierre bought \"blue-green-red\" product-2345 and other blue red green stuff yellow black white product_a-b"
leontalbot
  • 2,513
  • 1
  • 23
  • 32

3 Answers3

1

In Java, you may use something like

String s = "Jean-Pierre bought \"blue-green-red\" product-2345 and other blue-red stuff. yellow-black-white. product_a-b";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile("(\"[^\"]*\")|\\b([a-zA-Z]+)-(?=[a-zA-Z]+\\b)").matcher(s);
while (m.find()) {
    if (m.group(1) != null) {
        m.appendReplacement(result, m.group(0));
    } else {
        m.appendReplacement(result, m.group(2) + " ");
    }
}
m.appendTail(result);
System.out.println(result.toString());
// => Jean Pierre bought "blue-green-red" product-2345 and other blue red stuff. yellow black white. product_a-b

See the Java demo.

The regex is

("[^"]*")|\b([a-zA-Z]+)-(?=[a-zA-Z]+\b)

Details

  • ("[^"]*") - Group 1: ", 0+ chars other than " and "
  • | - or
  • \b - word boundary -([a-zA-Z]+) - Group 2: 1+ letters (may be replaced with (\p{L}+) to match any letter)
  • - - a hyphen
  • (?=[a-zA-Z]+\b) - a positive lookahead that, immediately to the right of the current location, requires 1+ letters and a word boundary.

If Group 1 matches (if (m.group(1) != null)) you just paste the match back into the result. If not, paste back Group 2 value and a space.

Adding code here from the question, too, for better visibility:

(def s "Jean-Pierre bought \"blue-green-red\" product-2345 and other blue-red stuff. yellow-black-white. product_a-b"

(defn append [[g1 g2 g3]] (if g2 g1 (str g3 " ")))

(clojure.string/replace s #"(\"[^\"]*\")|\b([a-zA-Z]+)-(?=[a-zA-Z]+\b)" append)

;;=> "Jean Pierre bought \"blue-green-red\" product-2345 and other blue red stuff. yellow black white. product_a-b"
leontalbot
  • 2,513
  • 1
  • 23
  • 32
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
0

This should work if you don't need to handle too complex cases:

(?: |^)\w+(-)(?![0-9])\w+

This matches any instance of word(hyphen)word that has a space at the beginning or is the beginning of the line (so, the stuff in the quotes will not match because there would be a quote before it, not a space or the beginning of the line).

Let me know if this doesn't work for you. Live demo.

miike3459
  • 1,431
  • 2
  • 16
  • 32
  • Hmm, that's working except wouldn't work for `other blue-red-green-yellow stuff` which I want to cover too and I don't want this removed `product_a-b` – leontalbot Nov 23 '18 at 20:52
0

Try this one

(".*?")|((?<group>\b([A-z*]+))-)

with substitution

${group} $1

You can test it here

Alfredo A.
  • 1,697
  • 3
  • 30
  • 43