How can i parse the exported bookmarks file from Google Chrome and Mozilla Firefox in Java.Is there any libraries available to parse them directly and obtain the URLS in them.
Also sample codes for parsing them in Java are most welcomed .
How can i parse the exported bookmarks file from Google Chrome and Mozilla Firefox in Java.Is there any libraries available to parse them directly and obtain the URLS in them.
Also sample codes for parsing them in Java are most welcomed .
In most cases, you don't really need to parse the HTML file. Chrome stores its bookmarks in a JSON file. It's a lot simpler to just read that file using a JSON parser.
The file you are interested in is located at (on Linux, anyway, Google around for other O/S):
/home/your_name/.config/google-chrome/Default/Bookmarks
JSON parsing is easy. Google around or start with How to parse JSON in Java.
If you want to visualize JSON data before you start digging through it, then also have a look at http://chris.photobooks.com/json/default.htm.
Per new comments posted , the solution would be to use JSOUP Open Source Program to do this. JSOUP accepts only HTTP or HTTPS protocols so you might want to host the exported bookmark HTML on a Local Server like tomcat and obtain the DOM of it
http://yourip:<port>/<yourProject>/<bookmark.html>.
JSOUP is pretty self-explanatory.
Other simpler ways :
Chrome and Firefox bookmarks are stored as JSON like below.
Java way : I would suggest you use JSON to parse these. Make a reference Java Object based on the below structure.
or simply use UNIX Command prompt and do a
grep -i "url" <bookmark file path> | cut -d":" -f2
However if you still interested to do with Chrome APIs then please visit : http://developer.chrome.com/extensions/bookmarks.html
{
"checksum": "702d8e600a3d70beccfc78e82ca7caba",
"roots": {
"bookmark_bar": {
"children": [ {
"date_added": "12939920104154671",
"id": "3",
"name": "Development/Tutorials/Git/git-svn - KDE TechBase",
"type": "url",
"url": "http://techbase.kde.org/Development/Tutorials/Git/git-svn"
}, {
"date_added": "12939995405838705",
"id": "4",
"name": "QJson - Usage",
"type": "url",
"url": "http://qjson.sourceforge.net/usage.html"
I am a bit late to this question. But if it is still relevant: I needed to do the same (and also other bookmark sources: GitHub Stars, Netscape and Google Bookmarks as well) and build my own. You can look and take it from my repo: https://github.com/IvoLimmen/mystart.
If somebody is interested: Here's a scala snippet of how you could tackle parsing Chrome's bookmarks JSON file (not thoroughly tested though, just to get the idea):
import org.json4s.DefaultFormats
import org.json4s.native.JsonMethods
import org.junit.Test
class BookmarksImporterTest {
implicit val formats: DefaultFormats.type = DefaultFormats
def analyse(element: Node): List[Node] = {
element.children.flatMap(c => {
c.`type` match {
case Some("folder") => c.children.flatMap(r => analyse(r))
case Some("url") => List(c)
case _ => println("???"); List()
}
})
}
@Test
def test(): Unit = {
val source = scala.io.Source.fromFile("bookmarks.json")
val json = JsonMethods.parse(source.reader())
val bookmarks = json.extract[ChromeBookmarks]
val bms = bookmarks.roots.flatMap {
case (name, elements) => analyse(elements)
}
println("found " + bms.size + " entries")
}
}
case class ChromeBookmarks(checksum: String, roots: Map[String, Node], version: Int)
case class Node(
id: Option[String],
name: Option[String],
url: Option[String],
children: List[Node],
`date-added`: Option[Long],
`date-modified`: Option[Long],
`type`: Option[String]
)