3

I've added emojis to my Android application and I've been using Regex, in Java, so the codes assigned to them will match the regular expression (which contains a pair of delimiters to be used with), making the characters show up as images.

Some emoji codes are, for example, sad, happy, smile.

So far, it's been like this:

  • Delimiters: ( and )

  • Regular expression: \\(([.[^\\(\\)]]+)\\)

  • Example of emoji codes matched: (sad), (happy), (smile).

I've noticed, tho, that for some new emojis that I added, it would be more practical for the user to type their codes using another pair of delimiters, like the letter z and ,. Then, the second case would be like this:

  • Delimiters: z and ,

  • Regular expression: z([.[^z\\,]]+)\\,

  • Example of emoji codes matched: zsad,, zhappy,, zsmile,.

What I want, then, is to merge both of these two regular expressions, so the user can type the emoji code using either of the two pair of delimiters, whichever he or she prefers, and it will be matched. For example, the sad emoji would be matched and it would show up as an image everytime it's written as either (sad) orzsad,, like in:

Hi. (sad) I've got bad news. zsad,

Hey... (sad)

Okay. Bye. zsad,

I've tried using alternation operator and lookarounds with no success. In the following two regular expressions, I only had matches to what is left of the | alternator (and I want matches for both left and right sides, of course):

\\(([.[^\\(\\)]]+)\\)|z([.[^z\\,]]+)\\,

z([.[^z\\,]]+)\\,|\\(([.[^\\(\\)]]+)\\)

And in the following regular expressions, I had no matches at all:

(\\(([.[^\\(\\)]]+)\\)|z([.[^z\\,]]+)\\,), (\\(([.[^\\(\\)]]+)\\))|(z([.[^z\\,]]+)\\,)

(z([.[^z\\,]]+)\\,|\\(([.[^\\(\\)]]+)\\)), (z([.[^z\\,]]+)\\,)|(\\(([.[^\\(\\)]]+)\\))

\\(|z([.[^\\(\\z\\,)]]+)\\)|\\,, (\\(|z)([.[^\\(\\z\\,)]]+)(\\)|\\,) (\\()|(z)([.[^\\(\\z\\,)]]+)(\\))|(\\,)

(?=\\(([.[^\\(\\)]]+)\\))(?=z([.[^z\\,]]+)\\,), (?=.*\\(([.[^\\(\\)]]+)\\))(?=.*z([.[^z\\,]]+)\\,)

Sorry for the gigantic text, I only wanted to give as much details as possible. Does anyone know what I am doing or writing wrong, and what regular expression I can use so it matches both zemojicode, and (emojicode)? Your help will be very much appreciated.

Community
  • 1
  • 1
Rob
  • 31
  • 2
  • Java does not let you use duplicate names for capture groups, nor does it have a branch reset support, nor conditional expressions. You need to use alternation and then act depending on how you need to process the matches: `\(([.[^()]]+)\)|z([.[^z,]]+),` (doubling the backslashes in Java, of course, this one can be used at the [online Java regex tester](http://www.ocpsoft.org/tutorials/regular-expressions/java-visual-regex-tester/)). Check [this demo](http://ideone.com/kXhOKJ) – Wiktor Stribiżew Apr 25 '16 at 10:26
  • BTW, why do you have a dot in your pattern? – Wiktor Stribiżew Apr 25 '16 at 10:32
  • I converted my comment into an answer. – Wiktor Stribiżew Apr 25 '16 at 11:23

3 Answers3

1

I'd probably go with

\((\w+)\)|z(\w+),

which I find simpler, and, as your own attempts, just capture the actual token. The \w allows for digits and underscore in the token as well, which I don't know if you consider a plus, but should hardly be a drawback(?).

So as a java string:

 \\((\\w+)\\)|z(\\w+),

Check it out here, at regex101.

As an alternative, I'd like to mention this one:

[(z](\w+)[),]

It's even simpler, but doesn't have the built in syntax check. In other words it would allow a combination of the delimiters, e.g. (sad, and zhappy), which may be considered a drawback.

Regards

SamWhan
  • 8,296
  • 1
  • 18
  • 45
  • I like your initial regex. If one wanted to match *just* the token, one could use look arounds, which would avoid dealing with 2 captured groups (the entire match is the token): `(?<=\()\w+(?=\))|(?<=z)\w+(?=,)` – Bohemian Apr 25 '16 at 11:28
0

You could use something like this:

(z[a-zA-Z]*,|\([a-zA-Z]*\))

Here's the example

It will capture z<anylettershere>, or (<anylettershere>)

To match more than 1 in a message, use global, which will probably be needed, and it is included in the example link. It matches the provided sentences by you on 3 separate Java regex testers that I have found.

Edit

Just a note, any of the \ characters may need to be doubled. I primarily use PHP, rather than Java, so I am not as knowledgable about that, but the example given would then become:

(z[a-zA-Z]*,|\\([a-zA-Z]*\\))
KyleFairns
  • 2,947
  • 1
  • 15
  • 35
0

Java does not let you use duplicate names for capture groups, nor does it have a branch reset support, nor conditional expressions. You need to use alternation and then act depending on how you need to process the matches.

So, use this regex:

\(([.[^()]]+)\)|z([.[^z,]]+),

Do not forget to double the backslashes in Java code.

Check this demo that only handles the match values:

String s = "Hi. (sad) I've got bad news. zsad,\nHey... (sad)\nOkay. Bye. zsad,";
System.out.println(s.replaceAll("\\(([.[^()]]+)\\)|z([.[^z,]]+),", "<<$0>>")); 

Output:

Hi. <<(sad)>> I've got bad news. <<zsad,>>
Hey... <<(sad)>>
Okay. Bye. <<zsad,>>
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563