0

I'm terrible at Regex and would greatly appreciate any help with this issue, which I think will be newb stuff for anyone familiar.

I'm getting a response like this from a REST call

    {"responseData":{"translatedText":"Ciao mondo"},"responseDetails":"","responseStatus":200,"matches":[{"id":"424913311","segment":"Hello World","translation":"Ciao mondo","quality":"74","reference":"","usage-count":50,"subject":"All","created-by":"","last-updated-by":null,"create-date":"2011-12-29 19:14:22","last-update-date":"2011-12-29 19:14:22","match":1},{"id":"0","segment":"Hello World","translation":"Ciao a tutti","quality":"70","reference":"Machine Translation provided by Google, Microsoft, Worldlingo or the MyMemory customized engine.","usage-count":1,"subject":"All","created-by":"MT!","last-updated-by":null,"create-date":"2012-05-14","last-update-date":"2012-05-14","match":0.85}]}

All I need is the 'Ciao mondo' in between those quotations. I was hoping with Java's Split feature I could do this but unfortunately it doesn't allow two separate delimiters as then I could have specified the text before the translation.

To simplify, what I'm stuck with is the regex to gather whatever is inbetween translatedText":" and the next "

I'd be very grateful for any help

CitizenSmif
  • 103
  • 1
  • 3
  • 12
  • 4
    [You're asking an XY question.](http://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) Regex is the wrong tool for the job. Are you sure you wouldn't rather parse the JSON? [See this](http://meta.stackexchange.com/a/66378/182647). – Li-aung Yip May 14 '12 at 03:09
  • 1
    And this [isn't even the first time](http://stackoverflow.com/questions/9832954/regex-issue-scraping-youtube) you've tried to apply regexes to problems better solved by more specific tools. As per my answer to your earlier question, trying to invent your own mini-parser for a language where parsers *already exist* is a losing game. Use a JSON parser. – Li-aung Yip May 14 '12 at 03:25
  • Wow. This actually worked perfectly and was very easy to implement. As this wasn't a proper 'answer', do you know how I finalize the question? – CitizenSmif May 14 '12 at 03:34
  • You don't *have* to accept any answer. You could accept DasBlinkenlight's answer (which is a perfect answer to your original question.) – Li-aung Yip May 14 '12 at 03:46

3 Answers3

3

You can use \"translatedText\":\"([^\"]*)\" expression to capture the match.

The expression meaning is as follows: find quoted translatedText followed by a colon and an opening quote. Then match every character before the following quote, and capture the result in a capturing group.

String s = " {\"responseData\":{\"translatedText\":\"Ciao mondo\"},\"responseDetails\":\"\",\"responseStatus\":200,\"matches\":[{\"id\":\"424913311\",\"segment\":\"Hello World\",\"translation\":\"Ciao mondo\",\"quality\":\"74\",\"reference\":\"\",\"usage-count\":50,\"subject\":\"All\",\"created-by\":\"\",\"last-updated-by\":null,\"create-date\":\"2011-12-29 19:14:22\",\"last-update-date\":\"2011-12-29 19:14:22\",\"match\":1},{\"id\":\"0\",\"segment\":\"Hello World\",\"translation\":\"Ciao a tutti\",\"quality\":\"70\",\"reference\":\"Machine Translation provided by Google, Microsoft, Worldlingo or the MyMemory customized engine.\",\"usage-count\":1,\"subject\":\"All\",\"created-by\":\"MT!\",\"last-updated-by\":null,\"create-date\":\"2012-05-14\",\"last-update-date\":\"2012-05-14\",\"match\":0.85}]}";
System.out.println(s);
Pattern p = Pattern.compile("\"translatedText\":\"([^\"]*)\"");
Matcher m = p.matcher(s);
if (!m.find()) return;
System.out.println(m.group(1));

This fragment prints Ciao mondo.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
0

use look-ahead and look-behind to gather strings inside quotations: (?<=[,.{}:]\").*?(?=\")

class Test
{
    public static void main(String[] args)
    {
        Scanner scanner = new Scanner(System.in);
        String in = scanner.nextLine();

        Matcher matcher = Pattern.compile("(?<=[,.{}:]\\\").*?(?=\\\")").matcher(in);

        while(matcher.find())
            System.out.println(matcher.group());
    }
}
Untitled
  • 781
  • 1
  • 6
  • 24
0

Try this regular expression -

^.*translatedText":"([^"]*)"},"responseDetails".*$

The matching group will contain the text Ciao mondo.

This assumes that translatedText and responseDetails will always occur in the positions specified in your sample.

Pradeep
  • 148
  • 1
  • 9
  • It also assumes that the `Translated Text` will only contain `[a-zA-Z\s]`. What if the translated text is `¡Santo cielo! ¿Es que una vaca?`? Or possibly the same thing in Russian - `Святая корова! Это корова?` – Li-aung Yip May 14 '12 at 03:29
  • Then Houston we have a problem! I like dasblinkenlight's solution. I am revising mine to accept everything that's not a double quote. – Pradeep May 14 '12 at 03:33
  • @Li-aungYip That's a poor translation - the `¡Santo cielo!` is commonly translated as `Боже мой!` – Sergey Kalinichenko May 14 '12 at 03:39
  • @dasblinkenlight: Google Translate gave that to me (source text: "Holy cow! Is that a cow?") Perhaps you should rate the Google Translation for improvement. ;) – Li-aung Yip May 14 '12 at 03:59