24

My aim is to detect the unvisited links on a webpage and then create a greasemonkey script to click on those links. By unvisited links here I mean the links which are not opened by me. Since I can see all the browser provide capability to change the color of visited and unvisited link is it possible to detect these links in any manner. While searching I came upon this link: http://www.mozdev.org/pipermail/greasemonkey/2005-November/006821.html but someone here told me that this is no longer possible. Please help.

Chetan
  • 387
  • 1
  • 4
  • 11

2 Answers2

25

Correct, it is not possible for javascript to detect if a link is visited in either Firefox or Chrome -- which are the only 2 browsers applicable in this Greasemonkey context.

That is because Firefox and Chrome take security and privacy seriously. From the CSS2 spec:

Note. It is possible for style sheet authors to abuse the :link and :visited pseudo-classes to determine which sites a user has visited without the user's consent.

UAs may therefore treat all links as unvisited links, or implement other measures to preserve the user's privacy while rendering visited and unvisited links differently. See [P3P] for more information about handling privacy.

See also, "Privacy and the :visited selector"
You can see a demo showing that secure-ish browsers will not let you sniff visited links at jsfiddle.net/n8F9U.




For your specific situation, because you are visiting a page and keeping it open, you can help a script keep track of what links were visited. It's not fool-proof, but I believe it will do what you've asked for.

First, see the script in action by doing the following:

  1. Install the script, as is.
  2. Browse to the test page, jsbin.com/eledog.
    The test page adds a new link, every time it is reloaded or refreshed.
  3. The GM script adds 2 buttons to the pages it runs on. A "start/Stop" button in the upper left and a "Clear" button in the lower right.

    When you press the "Start" button, it does the following:

    1. All existing links on the page are logged as "visited".
    2. It starts a timer (default setting: 3 seconds), when the timer goes off, it reloads the page.
    3. Each time the page reloads, it opens any new links and kicks off a new reload-timer.
    4. Press the "Stop" button to stop the reloads, the list of visited links is preserved.

    The "Clear" button, erases the list of visited pages.
    WARNING: If you press "Clear" while the refresh loop is active, then the next time the page reloads, all links will be opened in new tabs.


Next, to use the script on your site...

Carefully read the comments in the script, you will have to change the @include, @exclude, and selectorStr values to match the site you are using.

For best results, disable any "Reload Every" add-ons, or "Autoupdate" options.


Important notes:

  1. The script has to use permanent storage to to track the links.
    The options are: cookies, sessionStorage, localStorage, globalStorage, GM_setValue(), and IndexedDB.

    These all have drawbacks, and in this case (single site, potentially huge number of links, multiple sessions), localStorage is the best choice (IndexedDB might be, but it is still too unstable -- causing frequent FF crashes on my machine).

    This means that links can only be tracked on a per-site basis, and that "security", "privacy", or "cleaner" utilities can block or erase the list of visited links. (Just like, clearing the browser's history will reset any CSS styling for visited links.)

  2. The script is Firefox-only, for now. It should not work on Chrome, even with Tampermonkey installed, without a little re-engineering.



The script:

/*******************************************************************************
**  This script:
**      1)  Keeps track of which links have been clicked.
**      2)  Refreshes the page at regular intervals to check for new links.
**      3)  If new links are found, opens those links in a new tab.
**
**  To Set Up:
**      1)  Carefully choose and specify `selectorStr` based on the particulars
**          of the target page(s).
**          The selector string uses any valid jQuery syntax.
**      2)  Set the @include, and/or, @exclude, and/or @match directives as
**          appropriate for the target site.
**      3)  Turn any "Auto update" features off.  Likewise, do not use any
**          "Reload Every" addons.  This script will handle reloads/refreshes.
**
**  To Use:
**      The script will place 2 buttons on the page: A "Start/Stop" button in
**      the upper left and a "Clear" button in the lower left.
**
**      Press the "Start" button to start the script reloading the page and
**      opening any new links.
**      When the button is pressed, it is assumed that any existing links have
**      been visited.
**
**      Press the "Stop" button to halt the reloading and link opening.
**
**      The "Clear" button erases the list of visited links -- which might
**      otherwise be stored forever.
**
**  Methodology:
**      Uses localStorage to track state-machine state, and to keep a
**      persistent list of visited links.
**
**      Implemented with jQuery and some GM_ functions.
**
**      For now, this script is Firefox-only.  It probably will not work on
**      Chrome, even with Tampermonkey.
*/
// ==UserScript==
// @name        _New link / visited link, tracker and opener
// @include     http://jsbin.com/*
// @exclude     /\/edit\b/
// @require     http://ajax.googleapis.com/ajax/libs/jquery/1.7.2/jquery.min.js
// @grant       GM_addStyle
// ==/UserScript==
/*- The @grant directive is needed to work around a design change
    introduced in GM 1.0.   It restores the sandbox.
*/

//--- Key control/setup variables:
var refreshDelay    = 3000;    //-- milliseconds.
var selectorStr     = 'ul.topicList a.topicTitle';

//--- Add the control buttons.
$("body")  .append (  '<div id="GM_StartStopBtn" class="GM_ControlWrap">'
                    + '<button>Start checking for new links.</button></div>'
            )
           .append (  '<div id="GM_ClearVisitListBtn" class="GM_ControlWrap">'
                    + '<button>Clear the list of visited links.</button></div>'
            );
$('div.GM_ControlWrap').hover (
    function () { $(this).stop (true, false).fadeTo ( 50, 1); },
    function () { $(this).stop (true, false).fadeTo (900, 0.8); }// Coordinate with CSS.
);

//--- Initialize the link-handler object, but wait until the load event.
var stateMachine;
window.addEventListener ("load", function () {
        stateMachine    = new GM_LinkTrack (    selectorStr,
                                                '#GM_StartStopBtn button',
                                                '#GM_ClearVisitListBtn button',
                                                refreshDelay
                                            );

        /*--- Display the current number of visited links.
            We only update once per page load here.
        */
        var numLinks    = stateMachine.GetVisitedLinkCount ();
        $("body").append ('<p>The page opened with ' + numLinks + ' visited links.</p>');
    },
    false
);


/*--- The link and state tracker object.
    Public methods:
        OpenAllNewLinks ()
        StartStopBtnHandler ()
        ClearVisitedLinkList ()
        StartRefreshTimer ();
        StopRefreshTimer ();
        SetAllCurrentLinksToVisited ()
        GetVisitedLinkCount ()
*/
function GM_LinkTrack (selectorStr, startBtnSel, clearBtnSel, refreshDelay)
{
    var visitedLinkArry = [];
    var numVisitedLinks = 0;
    var refreshTimer    = null;
    var startTxt        = 'Start checking for new links.';
    var stopTxt         = 'Stop checking links and reloading.';

    //--- Get visited link-list from storage.
    for (var J = localStorage.length - 1;  J >= 0;  --J) {
        var itemName    = localStorage.key (J);

        if (/^Visited_\d+$/i.test (itemName) ) {
            visitedLinkArry.push (localStorage[itemName] );
            numVisitedLinks++;
        }
    }

    function LinkIsNew (href) {
        /*--- If the link is new, adds it to the list and returns true.
            Otherwise returns false.
        */
        if (visitedLinkArry.indexOf (href) == -1) {
            visitedLinkArry.push (href);

            var itemName    = 'Visited_' + numVisitedLinks;
            localStorage.setItem (itemName, href);
            numVisitedLinks++;

            return true;
        }
        return false;
    }

    //--- For each new link, open it in a separate tab.
    this.OpenAllNewLinks        = function ()
    {
        $(selectorStr).each ( function () {

            if (LinkIsNew (this.href) ) {
                GM_openInTab (this.href);
            }
        } );
    };

    this.StartRefreshTimer      = function () {
        if (typeof refreshTimer != "number") {
            refreshTimer        = setTimeout ( function() {
                                        window.location.reload ();
                                    },
                                    refreshDelay
                                );
        }
    };

    this.StopRefreshTimer       = function () {
        if (typeof refreshTimer == "number") {
            clearTimeout (refreshTimer);
            refreshTimer        = null;
        }
    };

    this.SetAllCurrentLinksToVisited = function () {
        $(selectorStr).each ( function () {
            LinkIsNew (this.href);
        } );
    };

    this.GetVisitedLinkCount = function () {
        return numVisitedLinks;
    };

    var context = this; //-- This seems clearer than using `.bind(this)`.
    this.StartStopBtnHandler    = function (zEvent) {
        if (inRefreshCycle) {
            //--- "Stop" pressed.  Stop searching for new links.
            $(startBtnSel).text (startTxt);
            context.StopRefreshTimer ();
            localStorage.setItem ('inRefreshCycle', '0'); //Set false.
        }
        else {
            //--- "Start" pressed.  Start searching for new links.
            $(startBtnSel).text (stopTxt);
            localStorage.setItem ('inRefreshCycle', '1'); //Set true.

            context.SetAllCurrentLinksToVisited ();
            context.StartRefreshTimer ();
        }
        inRefreshCycle  ^= true;    //-- Toggle value.
    };

    this.ClearVisitedLinkList   = function (zEvent) {
        numVisitedLinks = 0;

        for (var J = localStorage.length - 1;  J >= 0;  --J) {
            var itemName    = localStorage.key (J);

            if (/^Visited_\d+$/i.test (itemName) ) {
                localStorage.removeItem (itemName);
            }
        }
    };

    //--- Activate the buttons.
    $(startBtnSel).click (this.StartStopBtnHandler);
    $(clearBtnSel).click (this.ClearVisitedLinkList);

    //--- Determine state.  Are we running the refresh cycle now?
    var inRefreshCycle  = parseInt (localStorage.inRefreshCycle, 10)  ||  0;
    if (inRefreshCycle) {
        $(startBtnSel).text (stopTxt); //-- Change the btn lable to "Stop".
        this.OpenAllNewLinks ();
        this.StartRefreshTimer ();
    }
}

//--- Style the control buttons.
GM_addStyle ( "                                                             \
    .GM_ControlWrap {                                                       \
        opacity:            0.8;    /*Coordinate with hover func. */        \
        background:         pink;                                           \
        position:           fixed;                                          \
        padding:            0.6ex;                                          \
        z-index:            666666;                                         \
    }                                                                       \
    .GM_ControlWrap button {                                                \
        padding:            0.2ex 0.5ex;                                    \
        border-radius:      1em;                                            \
        box-shadow:         3px 3px 3px gray;                               \
        cursor:             pointer;                                        \
    }                                                                       \
    .GM_ControlWrap button:hover {                                          \
        color:              red;                                            \
    }                                                                       \
    #GM_StartStopBtn {                                                      \
        top:                0;                                              \
        left:               0;                                              \
    }                                                                       \
    #GM_ClearVisitListBtn {                                                 \
        bottom:             0;                                              \
        right:              0;                                              \
    }                                                                       \
" );
Brock Adams
  • 90,639
  • 22
  • 233
  • 295
  • You're ignoring the fact that JavaScript can run in various security contexts. Greasemonkey is a Firefox add-on, and as such certainly does have access to the visited state of links. However, I don't think this is exposed to user scripts. – user123444555621 Sep 03 '11 at 07:31
  • @Pumbaa80: It *may* be true that **add-ons** can detect visited links, but they are also not limited to JS. Irregardless, Greasemonkey does not expose any enhanced style or visited-state information to GM **scripts**. – Brock Adams Sep 03 '11 at 07:40
  • Thank you for replying well in my page there is an option to Auto update I dont know how that works but I have turned it off (the Autoupdate option) and continuously refreshing the page by using firefox addon "Reload Every" to check for new links. Let me know if that answers your question. – Chetan Sep 03 '11 at 09:25
  • Also to add to my previous comment in my case I only want to know the unvisited links and not the visited links is it possible after the fix ? Also I found out one more webpage where author is doing something like this after the security fix : http://www.webdesignfromscratch.com/html-css/getting-around-the-css-history-leak-limitations/ – Chetan Sep 03 '11 at 09:41
  • @Chetan, that second article is just styling tricks, still won't help you. However, given your usage pattern, there is a workaround possible. I'll iron out the kinks and post it in about a day, if nobody beats me to it. – Brock Adams Sep 03 '11 at 09:58
  • @brock Thanks a lot for all your help. I will be looking forward to your post – Chetan Sep 03 '11 at 10:32
  • Okay, I've added a link-tracking script to my answer. It will do what you asked for. – Brock Adams Sep 05 '11 at 02:46
-2

You can parse all links on the page and and get their CSS color property. If a color of the link is a match to the color of unvisited link you defined in CSS the this link is unvisited.

This kind of technique usually used to determine all visited links. This is sort of a security breach that allows you to determine if user visited particular web-site. Usually used by sleazy marketers.

This kind of tricks usually classifies as a "browser's history manipulation tricks".

More info with code: http://www.stevenyork.com/tutorial/getting_browser_history_using_javascript

rinchik
  • 2,642
  • 8
  • 29
  • 46