-8

we already fetched the URLs and stored in the db using jsoup lib.Now we are looking to extract the data and store in db,but we are looking only specific field,rather than storing the whole page. for example :http://www.flipkart.com/shoes/ when we fetch this link, we need field like brands ,prices, reviews etc.. using java code!! Please help !

1 Answers1

-2

There are two ways you can filter out the whole content,

  1. Apply Regex on the response content and extract the needed fields.
  2. Using xpath you can extract the needed fields (Preferred and recommended way of parsing).

Ex: 1 - Regex

  1. Generate the regex pattern for your selected page.
  2. Get the response as String and apply the pattern and retrieve the data.

Ex: 2 - XPath

  1. Identify the methodolgy to locate each and every html element uniquely (Or list)
  2. Get the response as html/xml form and apply the xpath on the retrieved content and get the data.
Vikrant Kashyap
  • 6,398
  • 3
  • 32
  • 52
Hakuna Matata
  • 755
  • 3
  • 13
  • 1
    Regex should not be used to parse html. http://stackoverflow.com/a/6751339/1176178 – Zack Aug 02 '16 at 13:08