8

How can i parse the exported bookmarks file from Google Chrome and Mozilla Firefox in Java.Is there any libraries available to parse them directly and obtain the URLS in them.

Also sample codes for parsing them in Java are most welcomed .

Lijo John
  • 578
  • 8
  • 21

4 Answers4

7

In most cases, you don't really need to parse the HTML file. Chrome stores its bookmarks in a JSON file. It's a lot simpler to just read that file using a JSON parser.

The file you are interested in is located at (on Linux, anyway, Google around for other O/S):

/home/your_name/.config/google-chrome/Default/Bookmarks

JSON parsing is easy. Google around or start with How to parse JSON in Java.

If you want to visualize JSON data before you start digging through it, then also have a look at http://chris.photobooks.com/json/default.htm.

Community
  • 1
  • 1
mpenkov
  • 21,621
  • 10
  • 84
  • 126
5

Per new comments posted , the solution would be to use JSOUP Open Source Program to do this. JSOUP accepts only HTTP or HTTPS protocols so you might want to host the exported bookmark HTML on a Local Server like tomcat and obtain the DOM of it

 http://yourip:<port>/<yourProject>/<bookmark.html>. 

JSOUP is pretty self-explanatory.

Other simpler ways :

Chrome and Firefox bookmarks are stored as JSON like below.

Java way : I would suggest you use JSON to parse these. Make a reference Java Object based on the below structure.

or simply use UNIX Command prompt and do a

 grep -i "url" <bookmark file path> | cut -d":" -f2

However if you still interested to do with Chrome APIs then please visit : http://developer.chrome.com/extensions/bookmarks.html

{
   "checksum": "702d8e600a3d70beccfc78e82ca7caba",
   "roots": {
  "bookmark_bar": {
     "children": [ {
        "date_added": "12939920104154671",
        "id": "3",
        "name": "Development/Tutorials/Git/git-svn - KDE TechBase",
        "type": "url",
        "url": "http://techbase.kde.org/Development/Tutorials/Git/git-svn"
     }, {
        "date_added": "12939995405838705",
        "id": "4",
        "name": "QJson - Usage",
        "type": "url",
        "url": "http://qjson.sourceforge.net/usage.html"
user1428716
  • 2,078
  • 2
  • 18
  • 37
  • what you have to search is to html parsing using java.Use the export bookmarks option provided in the web browser to get the sample bookmark html file . – Lijo John Feb 22 '13 at 06:09
0

I am a bit late to this question. But if it is still relevant: I needed to do the same (and also other bookmark sources: GitHub Stars, Netscape and Google Bookmarks as well) and build my own. You can look and take it from my repo: https://github.com/IvoLimmen/mystart.

Ivo Limmen
  • 3,095
  • 1
  • 26
  • 23
0

If somebody is interested: Here's a scala snippet of how you could tackle parsing Chrome's bookmarks JSON file (not thoroughly tested though, just to get the idea):

import org.json4s.DefaultFormats
import org.json4s.native.JsonMethods
import org.junit.Test

class BookmarksImporterTest {

  implicit val formats: DefaultFormats.type = DefaultFormats

  def analyse(element: Node): List[Node] = {
   element.children.flatMap(c => {
      c.`type` match {
        case Some("folder") => c.children.flatMap(r => analyse(r))
        case Some("url")    => List(c)
        case _              => println("???"); List()
      }
    })
  }

  @Test
  def test(): Unit = {
    val source    = scala.io.Source.fromFile("bookmarks.json")
    val json      = JsonMethods.parse(source.reader())
    val bookmarks = json.extract[ChromeBookmarks]

    val bms = bookmarks.roots.flatMap {
      case (name, elements) => analyse(elements)
    }
    println("found " + bms.size + " entries")
  }

}

case class ChromeBookmarks(checksum: String, roots: Map[String, Node], version: Int)

case class Node(
    id: Option[String],
    name: Option[String],
    url: Option[String],
    children: List[Node],
    `date-added`: Option[Long],
    `date-modified`: Option[Long],
    `type`: Option[String]
)
evandor
  • 799
  • 1
  • 10
  • 23