we already fetched the URLs and stored in the db using jsoup lib.Now we are looking to extract the data and store in db,but we are looking only specific field,rather than storing the whole page. for example :http://www.flipkart.com/shoes/ when we fetch this link, we need field like brands ,prices, reviews etc.. using java code!! Please help !
Asked
Active
Viewed 509 times
1 Answers
-2
There are two ways you can filter out the whole content,
- Apply
Regex
on the response content and extract the needed fields. - Using
xpath
you can extract the needed fields (Preferred and recommended way of parsing).
Ex: 1 - Regex
- Generate the
regex
pattern for your selected page. - Get the response as
String
and apply the pattern and retrieve the data.
Ex: 2 - XPath
- Identify the methodolgy to locate each and every html element uniquely (Or list)
- Get the response as
html/xml
form and apply thexpath
on the retrieved content and get the data.

Vikrant Kashyap
- 6,398
- 3
- 32
- 52

Hakuna Matata
- 755
- 3
- 13
-
1Regex should not be used to parse html. http://stackoverflow.com/a/6751339/1176178 – Zack Aug 02 '16 at 13:08