0

How can I parse in java specific text from a website? For example if the site has this in their webpage: <meta property='ABC' content="DEF" />. I would like to search for 'ABC' and find 'DEF'. How can I make a function like this? I've got no experience with html, nor parsing information.

Thanks

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880

2 Answers2

0

I like JSOUP as it add a lot of nice features...

JSoup takes care of a lot of your document pulling then if you wish to pull down information you can use CSS selectors to access elements within the page.

As far as support for meta tags I cant write any test code but this example on stack overflow talks a bit about it.

Community
  • 1
  • 1
buzzsawddog
  • 662
  • 11
  • 32
0

I don't do a lot of Java, but this sounds like a good place to use regular expressions. For simple text searches like this, it's pretty simple. To search for 'ABC', simply use the regex ABC. You can search for instances of either 'ABC' or 'DEF' using something like ABC|DEF. I'm not sure exactly what you want, but if you clarify I can help more.

Java has some classes to evaluate these expressions, as do most programming languages.

import java.util.regex.Pattern;

import java.util.regex.Matcher;

For information about how to use these, see this link It provides pretty much all the information you need including info for understanding regexes.

To learn in more detail about regular expression syntax, go here.

There are other ways to search through strings to find patterns, but regular expressions are uniform across all languages and become more and more useful as patterns you look for become more complex.

mobyvb
  • 91
  • 4