0

everyone. I want to write a condition that matches/validates specified string pattern below.

METRICS__video1::[VIEWS=1000,LIKES=20,DISLIKES=20]

There should be no space. 2 string should be joined by 2 underscores(__) before colon(::). And after colons, sentence should be enclosed in square brackets([]). Sentence should have an integer after equals(=) and string before equals(=) separated with commas(,).

Any idea or whats the best way to do this. Thanks in advance.

  • 1
    You can do this by using regex: https://www.freeformatter.com/java-regex-tester.html – nourhero Jun 25 '20 at 21:06
  • Does this answer your question? [How to extract a substring using regex](https://stackoverflow.com/questions/4662215/how-to-extract-a-substring-using-regex) – Kitswas Jun 25 '20 at 21:22

2 Answers2

2

You can match that string with this regex (java version, with double backslashes \\):

METRICS__video1::[VIEWS=1000,LIKES=20,DISLIKES=20]
String pattern = "\\w+__\\w+::\\[\\w+=\\d+(,\\w+=\\d+)+\\]"

Explanation:

  • \\w+: 1 or more letter or number ===> METRICS
  • __ the 2 underscores ===> __
  • \\w+ : 1 or more letter or number --> video1
  • :: the 2 colons ===> ::
  • \\[ you need to escape the opening square bracket as it has a special meaning in regexes ===> [
  • \\w+=\\d+ the first pair, containing some letters or numbers, an equal sign, and 1 or more numbers ===> VIEWS=1000
  • (,\\w+=\\d+)+ a group starting with a comma, containing some letters or numbers, an equal sign, 1 or more numbers - the final + means there could be more than one group ===> (,LIKES=20)(,DISLIKES=20)
  • \\] the closing square bracket ==> ]
assylias
  • 321,522
  • 82
  • 660
  • 783
  • 1
    Oops. What it there’s only 1 group? The standard approach, and the only one that works in all cases, is to move the comma to the start of the second part of the CSV pattern. Also, the pattern doesn’t have double backslashes, only when coded as a String literal in java - eg if you copy pasted your solution into a regex tester it would fail – Bohemian Jun 25 '20 at 21:36
  • Yup, you win :-) – assylias Jun 25 '20 at 21:40
1

This regular expression should work for what you described:

if (Pattern.matches("\\w+__\\w+::\\[((\\w+=\\d+)(,(?=\\w)|\\]$))+", yourStringHere)) {
    /* DO SOME STUFF */
}

https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Lev M.
  • 6,088
  • 1
  • 10
  • 23
  • 1
    This allows any number (including zero) of commas between terms – Bohemian Jun 25 '20 at 21:37
  • @Bohemian you are correct, I fixed it to a more precise version. – Lev M. Jun 25 '20 at 22:15
  • better, but this still matches `METRICS__video1::[VIEWS=1000,LIKES=20,DISLIKES=20,` – Bohemian Jun 25 '20 at 23:31
  • Nope. Now it doesn’t match when the last term is a single letter, eg `METRICS__video1::[VIEWS=1000,LIKES=20,X=20`. The reason is `[^$]` doesn’t do what you hope it might; inside a character class most characters lose their special regex meaning, including `$`: `[^$]` means “not a dollar sign”, so it consumes the first letter of the next term. Hint: to make your approach (which is quite nice because it doesn’t repeat the term pattern) work you need a look ahead. – Bohemian Jun 26 '20 at 21:31
  • @Bohemian I do appreciate your dedication in checking this answer. I did not consider any of these edge cases, and the test I did on the last iteration was not good enough. This is a nice learning experience. – Lev M. Jun 26 '20 at 21:56