0

I want to write a Python script which can keep track of which webpages have been opened in my webbrowser(Mozilla Firefox 23). I don't know where to start. The standard webbrowser module of Python allows webpages to be opened but the standard documentation doesn't have anything about interacting with the webpage.

So do I need to write a plugin for my browser which can send the data to my Python script for am I missing functionality from the standard library?

I have looked at some related questions like this but they are all about simulating web-browser in Python using mechanize and/or selenium. I don't want to do that. I want to get data from my webbrowser in using standard Python libraries.

EDIT

Just to add some more clarity to the question, I want to keep track of the current webpages open in firefox.

Community
  • 1
  • 1
Aseem Bansal
  • 6,722
  • 13
  • 46
  • 84
  • 4
    Constructive Criticism: Placing a comment explaining the downvote never hurts. It can help in avoiding bad questions in the future. – Aseem Bansal Aug 26 '13 at 08:27
  • 6
    **This is for anyone voting to close it as too broad**. I have specifically written "I want to get data from my webbrowser using **standard Python libraries**." I tagged it as Python3 and Python has a philosophy that **there should be one way to do things** and Python3 took care of many redundancy in Python2. How is it too broad then? Please explain. I would like to hear the explanation. – Aseem Bansal Aug 26 '13 at 09:23

2 Answers2

6

This answer may be a bit fuzzy -- that is because the question is not extremely specific.

If I understand it well, you want to examine History of the visited pages. The problem is that it is not directly related to an HTML, nor to http protocol, nor to web services. The history (that you can observe in Firefox when pressing Ctrl-H) is the tool implemented in Firefox and as such, it is definitely implementation dependent. There can be no standard library that would be capable to extract the information.

As for the HTTP protocol and the content of the pages in HTML, there is nothing like interaction with the content of the pages. The protocol uses GET with URL as the argument, and the web server sends back the text body with some meta information. The caller (the browser) can do anything with the returned data. The browser uses the tagged text and interprets it as a readable document with parts rendered as nicely as possible. The interaction (clicking on a href) is implemented by the browser. It causes other GET commands of the http protocol.

To answer your question, you need to find how Mozilla Firefox 23 stores the history. It is likely that you can find it somewhere in the internal SQLite databases.

Update 2015-08-24: See the erasmortg's comment about the changes of placing the information in Firefox. (The text below is older than this one.)

Update: The list of open tabs is bound to the user. As you probably want it for Windows, you should first get the path like c:\Users\myname.mydomain\AppData\Roaming\Mozilla\Firefox\Profiles\yoodw5zk.default-1375107931124\sessionstore.js. The profile name should probably be extracted from the c:\Users\myname.mydomain\AppData\Roaming\Mozilla\Firefox\profiles.ini. I have just copied the sessionstore.js for trying to get the data. As it says javascript, I did use the standard json module to parse it. You basically get the dictionary. One of the items with the key 'windows' contains another dictionary, and its 'tabs' in turn contains information about the tabs.

Copy your sessionstore.js to a working directory and execute the following script there:

#!python3

import json

with open('sessionstore.js', encoding='utf-8') as f:
    content = json.load(f)

# The loaded content is a dictionary. List the keys first (console).
for k in content:
    print(k)

# Now list the content bound to the keys. As the console may not be capable
# to display all characters, write it to the file.
with open('out.txt', 'w', encoding='utf-8') as f:

    # Write the overview of the content.
    for k, v in content.items():
        # Write the key and the type of the value.
        f.write('\n\n{}:  {}\n'.format(k, type(v)))

        # The value could be of a list type, or just one item.
        if isinstance(v, list):
            for e in v:
                f.write('\t{}\n'.format(e))
        else:
            f.write('\t{}\n'.format(v))

    # Write the content of the tabs in each windows.
    f.write('\n\n=======================================================\n\n')
    windows = content['windows']
    for n, w in enumerate(windows, 1):  # the enumerate is used just for numbering the windows
        f.write('\n\tWindow {}:\n'.format(n))
        tabs = w['tabs']
        for tab in tabs:
            # The tab is a dictionary. Display only 'title' and 'url' from 
            # 'entries' subdictionary.
            e = tab['entries'][0]
            f.write('\t\t{}\n\t\t{}\n\n'.format(e['url'], e['title']))

The result is both displayed on the console (few lines), and written into the out.txt file in the working directory. The out.txt (at the end of file) contains something like that in my case:

Window 1:
    http://www.cyrilmottier.com/
    Cyril Mottier

    http://developer.android.com/guide/components/fragments.html#CommunicatingWithActivity
    Fragments | Android Developers

    http://developer.android.com/guide/components/index.html
    App Components | Android Developers

    http://www.youtube.com/watch?v=ONaD1mB8r-A
    ▶ Introducing RoboSpice: A Robust Asynchronous Networking Library for Android - YouTube

    http://www.youtube.com/watch?v=5a91dBLX8Qc
    Rocking the Gradle with Hans Dockter - YouTube

    http://stackoverflow.com/questions/18439564/how-to-keep-track-of-webpages-opened-in-web-browser-using-python
    How to keep track of webpages opened in web-browser using Python? - Stack Overflow

    https://www.google.cz/search?q=Mozilla+firefox+list+of+open+tabs&ie=utf-8&oe=utf-8&rls=org.mozilla:cs:official&client=firefox-a&gws_rd=cr
    Mozilla firefox list of open tabs - Hledat Googlem

    https://addons.mozilla.org/en-US/developers/docs/sdk/latest/dev-guide/tutorials/list-open-tabs.html
    List Open Tabs - Add-on SDK Documentation

    https://support.mozilla.org/cs/questions/926077
    list all tabs button not showing | Fórum podpory Firefoxu | Podpora Mozilly

    https://support.mozilla.org/cs/kb/scroll-through-your-tabs-quickly
    Scroll through your tabs quickly | Nápověda k Firefox
pepr
  • 20,112
  • 15
  • 76
  • 139
  • Just to add some more clarity to the question, I want to keep track of the current webpages open in firefox. I don't think that would change your answer but just in case, does it? – Aseem Bansal Aug 26 '13 at 15:06
  • Firefox stores URLs of the open pages somewhere. You may have observed that FF is capable to recover them after a crash. You can also configure FF so that it open pages that were visible at the time of closing FF. I guess it will be quite easy to get that list of URLs. – pepr Aug 26 '13 at 17:05
  • 1
    IF you feel like updating your answer: The tabs are currently stored in a `js` file named `recovery.js` updated every 15 seconds, [see here for details](https://dutherenverseauborddelatable.wordpress.com/2014/06/26/firefox-the-browser-that-has-your-backup/), the path is originally the same but has an added folder at the end:` \sessionstore-backups\`. The rest of your answer still stands. – erasmortg Aug 23 '15 at 10:51
  • 1
    @erasmortg: Thanks for the info. I have just put the update note and pointed to your comment. Have a good time ;) – pepr Aug 24 '15 at 14:00
2

You want to keep track of the web pages opened in FF through Python. So, why don't you write a web proxy in Python and configure FireFox to use that web proxy.

After that you can filter all the HTTP requests that are emitted from Firefox through regular expression and store them in a file or database.

mico
  • 12,730
  • 12
  • 59
  • 99
Jacklapott
  • 65
  • 1
  • 7