-1

One the following text, I want to extract the values within quotation marks, for example "hash". The value associated with hash is from the beginning until the end of quotation marks, in this case:

00000e96c46d15aeaaf9ef6f88a295a8f17207d4cd9ac074d2314680095befc854d5a00600602af2fe03a24b61566ca2d8a6b858b0af840309ae449316833923

My pattern for this would be

Scanner s = new Scanner(new File(path.toString()));
Pattern pattern = Pattern.compile("\"hash\": \".*\"");
String nextMatch = s.findWithinHorizon(pattern, 0);

Explanation of the pattern: I look at sequence starting anywhere with a quotation mark, followed by the word hash and another quotation mark. Then a ":" follows + 1 space. Afterwards as many text until another quotation mark appears.

Sadly, this pattern does not work and I do not understand why.

{"hash": "00000e96c46d15aeaaf9ef6f88a295a8f17207d4cd9ac074d2314680095befc854d5a00600602af2fe03a24b61566ca2d8a6b858b0af840309ae449316833923", "block": "{\"type\": \"block\", \"transactions\": [], \"timestamp\": \"2017-09-07T07:09:52.628676\", \"reward\": \"d5075b5d43cf97b73bd6483488f1f6a648dc83add93a37bb0817b17331fd51d989e2cf9fd3c8c0206fb89b84cf9e151b7d2123e4f6d71c95868bdfe1f4aa6b9e754a51a8e04bd49f5eec1931840315bc42844b715250534612da5e5809bdb14c496ad1a2d4b00823b80aacb7023667ca6923088b438dc5053d5bbf29a61620b28afa5d52d325ed8aa073a7f3a37e675c6bdf2dad09b809c8f3c60206392764458effb2c512d072af0cc7ea96058e1e19eccc72072939d5d16409843151b55607715f7ea9eff911914be9c88f1e719ed5cc5e95737977feeedbbd96b9150ce5a54c491aa94eab58df129445d89c9f8937c598ba95380a42c22e06ed2f0da4959b331e99e25554c122a095b2520ba3dcff6585c8c07cc6da9d3ad7e71a0ade2c6704c7c27aca3337916794efc4fa1a6e9784bbce1173ee7b408ece86a8a37f84706ed8092c06bb914510a97edffdda55ec09141bbfdf5af7029aa82e5f7e7da1cb1781426fef33721b66e727ea7aef19fb5dea6edc3e16c6d7f08f04f5067dc9a2d0c01015c1af848a1fcd6c64eef039c9c5d8e737c0655a97b6bc876854a34ad94fcd29218524c6c7881bd1ae4a9279edc12f95720d8a010d9a4c7dd19a4415bed2687fb462d95da8436954b5fd82d92b98935650a1fd7fa215ba95e8b20d8594c50cb9a8bc683af32133c007bc0dff3edd36e0c20688385891788de63a5adcbb\", \"difficulty\": \"0\", \"nonce\": \"feec6d57f31d8aee18889026e4e484d96de6b874013a1932018e809c60c45019033389671dcc2e3138a555705cec95e365d79d3e68a909efcf15d0d137770131\", \"parent\": \"00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\"}", "type": "block_hash"}

My entire code:

public class TryToStream {

    static String url = "SorryICantShowYouThatOne";
    static String charset = "UTF-8";


    public static void main(String[] args) throws IOException, ParseException {
        JSONParser parser = new JSONParser();

        URL getURL = new URL(url + "get?start_at=");
        int counter = 0;
        boolean inputAvail = true;
        //clear textfile
        PrintWriter pw = new PrintWriter("jsonFormatted.txt");


        URL tmpURL = new URL(url + "get?start_at=" + counter);
        URLConnection connection = tmpURL.openConnection();
        InputStream is = connection.getInputStream();
        JSONArray json = (JSONArray) parser.parse(new BufferedReader(new InputStreamReader(is)));
        //   FileOutputStream fos = new FileOutputStream(new File("output2.txt"), true);
        BufferedWriter bw = new BufferedWriter(new FileWriter("jsonFormattedStream.txt"));
        bw.write(json.toJSONString());
        bw.close();

        Iterator iter = json.iterator();
        boolean flagForTesting = true;
        BufferedWriter bw2 = new BufferedWriter(new FileWriter("jsonFormatted.txt"));
        Pattern pattern = Pattern.compile("\"hash\": \"(.*?)\"");

        while (iter.hasNext() && flagForTesting) {

            Matcher matcher = pattern.matcher(iter.next().toString());
            matcher.find();
            System.out.println(matcher.group(1));
            flagForTesting = false;
        }
        bw2.close();


        System.out.println("End");
    }
}

If I try to match the suggested regex, I do not get a match.

The result of iter.next():

{"block":"{\"type\": \"block\", \"transactions\": [], \"timestamp\": \"2017-09-07T07:09:52.628676\", \"reward\": \"d5075b5d43cf97b73bd6483488f1f6a648dc83add93a37bb0817b17331fd51d989e2cf9fd3c8c0206fb89b84cf9e151b7d2123e4f6d71c95868bdfe1f4aa6b9e754a51a8e04bd49f5eec1931840315bc42844b715250534612da5e5809bdb14c496ad1a2d4b00823b80aacb7023667ca6923088b438dc5053d5bbf29a61620b28afa5d52d325ed8aa073a7f3a37e675c6bdf2dad09b809c8f3c60206392764458effb2c512d072af0cc7ea96058e1e19eccc72072939d5d16409843151b55607715f7ea9eff911914be9c88f1e719ed5cc5e95737977feeedbbd96b9150ce5a54c491aa94eab58df129445d89c9f8937c598ba95380a42c22e06ed2f0da4959b331e99e25554c122a095b2520ba3dcff6585c8c07cc6da9d3ad7e71a0ade2c6704c7c27aca3337916794efc4fa1a6e9784bbce1173ee7b408ece86a8a37f84706ed8092c06bb914510a97edffdda55ec09141bbfdf5af7029aa82e5f7e7da1cb1781426fef33721b66e727ea7aef19fb5dea6edc3e16c6d7f08f04f5067dc9a2d0c01015c1af848a1fcd6c64eef039c9c5d8e737c0655a97b6bc876854a34ad94fcd29218524c6c7881bd1ae4a9279edc12f95720d8a010d9a4c7dd19a4415bed2687fb462d95da8436954b5fd82d92b98935650a1fd7fa215ba95e8b20d8594c50cb9a8bc683af32133c007bc0dff3edd36e0c20688385891788de63a5adcbb\", \"difficulty\": \"0\", \"nonce\": \"feec6d57f31d8aee18889026e4e484d96de6b874013a1932018e809c60c45019033389671dcc2e3138a555705cec95e365d79d3e68a909efcf15d0d137770131\", \"parent\": \"00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000\"}","type":"block_hash","hash":"00000e96c46d15aeaaf9ef6f88a295a8f17207d4cd9ac074d2314680095befc854d5a00600602af2fe03a24b61566ca2d8a6b858b0af840309ae449316833923"}

Cœur
  • 37,241
  • 25
  • 195
  • 267
InDaPond
  • 574
  • 3
  • 6
  • 23
  • 1
    Don't use RegEx. Parse the string to JSON and extract the value of `hash` property from it. – Tushar Oct 31 '17 at 06:08
  • I wanted to do that, but sadly at some point the document does not seem to be json anymore, therefore I cannot guarantee its json formats – InDaPond Oct 31 '17 at 06:11

1 Answers1

1

Your regex is nearly there!

The problem with your regex is that it tries to match everything in the string until the last quote mark. So it will match all the way to "block_hash". You just need to tell it to match lazily, so it will stop matching when it encounters the first quote mark.

"hash": ".*?" // notice the question mark!

Now this regex matches:

"hash": "00000e96c46d15aeaaf9ef6f88a295a8f17207d4cd9ac074d2314680095befc854d5a00600602af2fe03a24b61566ca2d8a6b858b0af840309ae449316833923"

If you want to capture the things inside the quotes, I suggest you add a capture group:

"hash": "(.*?)"

You can use this regex like this:

Pattern pattern = Pattern.compile("\"hash\": \"(.*?)\"");
Matcher matcher = pattern.matcher(yourString);
matcher.find();
System.out.println(matcher.group(1));
Sweeper
  • 213,210
  • 22
  • 193
  • 313
  • Look at my updated code, I am still not getting a match – InDaPond Oct 31 '17 at 08:47
  • @InDaPond I don't think there is anything wrong with the regex. Try debugging your code, like printing the value of `iter.next().toString()`. If you still cannot solve the problem, try asking another question. – Sweeper Oct 31 '17 at 21:44
  • iter.next() returns something with hash inside, thats why Im so confused and started the question – InDaPond Nov 01 '17 at 12:15