3

How would I get a value in between two the quotes after value=?

So, value="hi my name is bob" />
would return: hi my name is bob
or value="Ouch! "that hurt" lol..." />
would return: Ouch! "that hurt" lol...

I know the value=" TEXT_HERE " /> will always occur and I want the string inside of it. and yes, there is always a space before the /> at the end. It is HTML code I am parsing, I have gotten everything except for this field to parse correctly.

EDIT Let me clarify a little bit. I can't really use any side tools because I am using Webdriver to parse the page, after I get the source I throw the HTML into a string and then I try to parse the "value" tag out of all that data.
So the regex code has to be able to maneuver through all kinds of coding and get whatever the value field is. And I need every value field's data.

Jean-François Corbett
  • 37,420
  • 30
  • 139
  • 188
Austin
  • 3,010
  • 23
  • 62
  • 97
  • 2
    You're not trying to [parse HTML using regex](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags) by any chance, are you? – G_H Oct 28 '11 at 18:23
  • If you output to html try " but this is not about java – sergtk Oct 28 '11 at 18:24
  • If it is valid HTML that you are working with, `value="Ouch! "that hurt" lol..." />` should be `value="Ouch! "that hurt" lol..." />` (inner quotes should be escaped). – Briguy37 Oct 28 '11 at 18:25
  • @Briguy37 I don't think that's the way to escape in "HTMLquot; – G_H Oct 28 '11 at 18:26
  • We need some "the-pony-comes" tag for questions about XML or HTML regex "parsing". – G_H Oct 28 '11 at 18:29
  • Aaaand pst is now my new hero :) – G_H Oct 28 '11 at 19:05

6 Answers6

6

You could use String.indexOf() to search for the first occurrence of ". Save the first occurrence index, get the last occurrence index using String.lastIndexOf() and call String.substring() to get the substring you want out.

Michael Fox
  • 611
  • 3
  • 9
3

If you are parsing HTML with Java I suggest you use a Java library like jsoup to ease your work.

André Ricardo
  • 3,051
  • 7
  • 23
  • 32
1

I recommend using XPath to do the job it was designed for. Here is an example that should get you on the track:


import java.io.ByteArrayInputStream;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;

import org.w3c.dom.Document;
import org.w3c.dom.NodeList;

public class Test {
  public static void main(String[] args) throws Exception {
    String s = ""
      + "<?xml version=\"1.0\"?>"
      + "<root>"
      + "  <a value=\"hello\" />"
      + "  <b value=\'hello\' />"
      + "  <c value=\"hello &quot;bob&quot;\" />"
      + "</root>";
    ByteArrayInputStream bis = new ByteArrayInputStream(s.getBytes());

    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    DocumentBuilder builder = factory.newDocumentBuilder();
    Document d = builder.parse(bis);
    XPathFactory xpf = XPathFactory.newInstance();
    XPath xpath = xpf.newXPath();
    XPathExpression xpe = xpath.compile("//@value");
    NodeList nl = (NodeList)xpe.evaluate(d, XPathConstants.NODESET);

    for (int i = 0; i < nl.getLength(); i++) {
      System.out.println(nl.item(i).getNodeValue());
    }
  }
}

The output is then:


hello
hello
hello "bob"
Jiri Patera
  • 3,140
  • 1
  • 20
  • 14
0

You can use regex to get the value between the quotes or you can work with the string that holds the whole statement/sentence/value.

For example you can use String.replaceAll method to replace all '"' (quotes) with '' (empty spaces).

Mechkov
  • 4,294
  • 1
  • 17
  • 25
0

In general:

echo 'value="hi my name is bob" />' | perl -nle 'm{value="\s*([^"]*)} and print $1'
JRFerguson
  • 7,426
  • 2
  • 32
  • 36
0

Here is some Java code and regex pattern that will work for you:

import java.util.regex.Matcher;
import java.util.regex.Pattern;

Pattern pattern = Pattern.compile("[\\d\\w\\s'\"]+\\z");
Matcher matcher = pattern.matcher("value=\"hi my name is bob\"");

while (matcher.find()) {
    System.out.print("found:'"+matcher.group()+"'");
}

prints...

found:'"hi my name is bob"'

You'll need to escape the quotes in your strings with \.

Carol Skelly
  • 351,302
  • 90
  • 710
  • 624
  • 2
    H̸̡̼͍͍̩̐͋̄̈ͥē̡̹̼̝̱͉̙̻̋̈́͛ͮ̈͒̾'͔̲̰̗̲͈̊͊̽ͫͣ̚ͅͅş̸͇̋̐̑̄͋ ̬̮͕̅̍ͬͬ̔ͯ̔̀͟a̵̶̼̱̗̣͎̱̗ͫͤ͞l͓̜̥̗͈͍̒͌̆̍͆̏͊͆̈ͅr̥͍͙̱̞̀ͯ̈́ͦ͡e̲̺͕̭̗͗͒ͩ͟ḁ̡̛̹͚̭͔̥͕̘ͬͪ̓ͯd͎͓͔̪̟̦ͣ̈́͆͘͘͠ÿ̘̝̞̜̹̣̹̭́͂̔͊̍̋̏̿͌͘͢ ̼̤̱̯͙̞͈̜͉̑̿̍ͮ͌̉̔͂h͙̭̮̟̤̣̭̟̟̀̀̿e̹͓̼̗̤ͩ̅ͨ̕r̹͕̱ͫ̓͋ĕ̫͎̼̤̟ͤ̎̕.͓͙̱̅ͫ͠.̡͎̪̖̲̯͍̤̟ͣͬ̍̈ͯ.̢͇̮͍̼̲̦̪̭̃͛̉ͧ̏ͥ – G_H Oct 28 '11 at 18:56