0

I have data in following format

<foo bar> <property abc> <this foo bar> .

Now there are essentially 4 parts in this string: foo bar; property abc; this foo bar; and .. How do I tokenize the above string into these four parts?

Joshua Taylor
  • 84,998
  • 9
  • 154
  • 353
frazman
  • 32,081
  • 75
  • 184
  • 269
  • What are you trying to parse? (just in case there's some library that does it for you) – Dennis Meng Aug 26 '13 at 20:13
  • 3
    googling "rdf parser java" got me http://stackoverflow.com/questions/73445/what-are-some-good-java-rdf-libraries – Dennis Meng Aug 26 '13 at 20:17
  • 4
    If it's RDF why wouldn't you use an RDF library????? – Dave Newton Aug 26 '13 at 20:18
  • possible duplicate of [Using Regular Expressions to Extract a Value in Java](http://stackoverflow.com/questions/237061/using-regular-expressions-to-extract-a-value-in-java) – SwiftMango Aug 26 '13 at 20:21
  • 1
    Note that as far as RDF serializations go, the snipped you've shown could be N-Triples, Turtle, or N3. [N-Triples](http://www.w3.org/TR/rdf-testcases/#ntriples) is a line based format, and parsing each line as you've requested will be fine, except that the object in each triple may also be an literal, which would not look like `<...>`. Turtle and N3 allow much more complicated expressions in addition to these simple expressions, and the line-based approach will not work on those serializations. It would be much better to use a dedicated RDF parser than to roll your own and run into problems. – Joshua Taylor Aug 26 '13 at 20:35
  • 1
    @texasbruce As phrased, the possible duplicate does seem like a good fit, but because Fraz is looking to parse RDF documents, it's likely that the format is more complicated than the question suggests. I don't think that the answers to that question will necessarily be adequate for Fraz's _actual_ task (though it's a perfect fit for the question as asked). – Joshua Taylor Aug 26 '13 at 20:41

2 Answers2

1

As others have suggested if you want to parse RDF graphs just use a library like Apache Jena (disclaimer - I am one of the developers).

If your problem is more that you need direct control over the parsing process then there are several options:

  • Jena has a TokenizerText class which can tokenize NTriple/Turtle/SPARQL like data if you want to work with the data at the textual level
  • You can implement StreamRDF interface and use this with the built-in parsers to control what happens to the data as it is parsed at the triple/quad level
RobV
  • 28,022
  • 11
  • 77
  • 119
0
String[] array = string.split("> ");

for (int i = 0; i < array.length -1; i++){
    System.out.println(array[i] + ">");
}
System.out.println(array[array.length-1]);
Brinnis
  • 906
  • 5
  • 12