5

I wanna know if it's possible to change the url showing and according to that change the content of the page and make the urls and the content of the page to be robot friendly (which means robots can actually index them).

I've already tried using AJAX to dynamically load the data and using angularjs routing but none of them can be indexed by the robots.

also pretty urls and query strings are not what I'm looking for, I'm looking for a theory that renders the data at landing and changes the route and the content on click of links without page refresh and I don't want to write the code twice (once in server side and once in front-end).

these were the things I've already tried, any help or direction to the solution would be appreciated.

UPDATE

A no library solution/structure that would work on all languages with no dependency would be the most accurate answer!

Amin Jafari
  • 7,157
  • 2
  • 18
  • 43
  • You are mixing 2 different concepts ... SEO and human usable routing. For some search engines both are the same. SEO approach is dependent on what the search engines support. Google has claimed for quite some time now to support ajax driven sites...I'm not sure about status for other search engines. Strongly suggest you read the google webmaster guidelines on this topic ...as well as other search engines – charlietfl May 29 '16 at 16:05
  • http://stackoverflow.com/questions/13499040/how-do-search-engines-deal-with-angularjs-applications – maksbd19 May 29 '16 at 16:07
  • Also there are various methods available to serve non ajax representation of your pages if you deem it necessary – charlietfl May 29 '16 at 16:07
  • google suggest to use **pretty urls** for the ajax driven web applications, but I wanted to know if there's another way. @charlietfl – Amin Jafari May 29 '16 at 16:12
  • @AminJafari google doesn't care if you use pretty url's or not as regards to using hash based angular vs html5Mode routing. the guidelines explain how to use `hashbang` if needed – charlietfl May 29 '16 at 16:13
  • @charlietfl pretty urls is a term which google uses to describe the supported links for the ajax driven websites, believe me they are not pretty, and google suggests it just in case – Amin Jafari May 29 '16 at 16:16
  • Google also outlines various ways to provide non ajax versions of pages.... **if you think it is necessary**. the last part is up to you and is my emphasis – charlietfl May 29 '16 at 16:16
  • what do you mean by non ajax? @charlietfl – Amin Jafari May 29 '16 at 16:19
  • I mean cached versions or headless browser output versions of the ajax content. – charlietfl May 29 '16 at 16:21
  • alright but what if I can't use the server to handle the robots, there must be a simpler way – Amin Jafari May 29 '16 at 16:25
  • Yes...and again there are various approaches outlined in the google web master docs. Personally I think you can reasonably take approach that bots are now reading ajax per https://webmasters.googleblog.com/2015/10/deprecating-our-ajax-crawling-scheme.html – charlietfl May 29 '16 at 16:29
  • 1
    @maksbd19 http://stackoverflow.com/a/23245379/2832571 this was great – Amin Jafari May 29 '16 at 16:32
  • @charlietfl thank you so much for the information – Amin Jafari May 29 '16 at 16:32
  • Meteor provides a pretty smart way of dealing with this problem - a package called "Spiderable" takes care of preparing a special version of the application that the crawlers can understand, using PhantomJS to render "static" HTML. – jkondratowicz Jun 01 '16 at 11:38
  • @jkondratowicz thanks but I don't intend to use a library just for this case (maybe none at all) also my website is not going to be "static" by any definition – Amin Jafari Jun 01 '16 at 12:09

5 Answers5

5

Here is something that could represent a starting point for a solution. Before you go on reading these are the main things to keep in mind about my answer:

  • all vanilla javascript
  • ajax call to load new content
  • change url on address bar without reloading the page
  • register url changes in browser history
  • seo friendly

But be aware that all is presented in draft code meant to explain the solution, you'll need to improve the code if you want to implement it on production.

Let's start with the index page.

index.php

<!DOCTYPE html>
<html>
<head>
    <title>Sample page</title>
    <meta charset="UTF-8">
    <script type="text/javascript" src="ajax_loader.js"></script>
</head>
<body>

<h1>Some static content</h1>
<a href="?main_content=external_content.php">
    Link to load dynamic content
</a>
<div id="main_content">
    <!--
        Here is where your dynamic content will be loaded.

        You can have as many dynamic container as you like.

        In my basic example you can attach one link to a
        single container but you can implement a more
        complete solution to handle multiple containers
        at the same time
    -->

    <!-- Leave this empty for the moment... some php will follow -->
</div>
</body>
</html>

Now let's see how the javascript can handle the links for loading content with ajax

ajax_loader.js

window.onload = function() {

        var load = function(e) {
            // prevent browser to load link
            event.preventDefault();

            // exit if target is undefined
            if(typeof(e.target) == 'undefined' ) {return;}

            // exit if clicked element is not a link
            if (e.target.tagName !== 'A') {return;}

            // get href from clicked element
            var href = e.target.getAttribute("href");

            // retrieve container and source
            var href_parts = href.split('=');
            var container = href_parts[0].substr(1);
            var source = href_parts[1];

            // instantiate a new request
            var request = new XMLHttpRequest();

            // bind a function to handle request status
            request.onreadystatechange = function() {
                if(request.readyState < 4) {
                    // handle preload
                    return;
                }
                if(request.status !== 200) {
                    // handle error
                    return;
                }
                if(request.readyState === 4) {
                    // handle successful request
                    successCallback();
                }
            };

            // open the request to the specified source
            request.open('GET', source, true);
            // execute the request
            request.send('');

            successCallback = function() {
                // on success place response content in the specified container
                document.getElementById(container).innerHTML = request.responseText;

                // change url in the address bar and save it in the history
                history.pushState('','',"?"+container+"="+source);
            }
        };

        // add an event listener to the entire document.
        document.addEventListener('click', load, false);
        // the reason why the event listener is attached
        // to the whole document and not only to the <a>
        // elements in the page is that otherwise the links
        // included in the dynamic content would not
        // liste to the click event

    };

now let's give a look back to some specific elements of our html

As said before the proposed script will attach the behavior to any link, you only need to format it so to be read properly by the load() function. The format is "?container_name=filename.php". Where container_name is the id of the div in which you want the content to be loaded in, while filename.php is the name of the file to be called by ajax to retrieve the content.

So if you have some content in your 'external_content.php' file and want it loaded in the div with id 'main_content' here is what you do

<a href="?main_content=external_content.php">Your link</a>
<div id="main_content"></div>

In this example the div 'main_content' is empty when the page first loads and will be populated on click of your link with the content of the external_content.php file. At the same time the address bar of your browser will change from http://www.example.com/index.php to http://www.example.com/index.php?main_content=external_content.php and this new url will be registered in your browser history

Now let's go further and see how we can make this SEO friendly so that http://www.example.com/index.php?main_content=external_content.php is a real address and the 'main_content' div is not empty when we load the page.

We can just add some php code to handle this. (Please not that you could even write some javascript to to a similar job, but since you mentioned the use of server side language I decided to go for php)

<a href="?main_content=external_content.php">Load</a>
<div id="main_content">
    <?php dynamicLoad('main_content','default_main_content.php'); ?>
</div>

Before showing it I want to explain what the php function dynamicLoad() does. It take two parameters, the first is equivalent to the container id, the second if the file where the default content is. To be more clear, if the requested url is http://www.example.com/ the function will put the content of default_main_content.php in the main_content div but if the url requested by the browser is http://www.example.com/index.php?main_content=external_content.php then function will put the content of external_content.php in the main_content div.

This mechanism helps the page to be SEO friendly and user friendly, so when the search engine crawler will follow the href "?main_content=external_content.php" that brings to the url "http://www.example.com/index.php?main_content=external_content.php" will find the same content displayed dynamically with the ajax call. And this is also true for the user who will reload the page with a refresh or from the history.

Here is the simple dynamicLoad() php function

<?php
    function dynamicLoad($contaner,$defaultSource){
        $loadSource = $defaultSource;
        if(isset($_GET[$contaner])) {
            $loadSource = $_GET[$contaner];
        }
        include($loadSource);
    }
?>

As said in the first lines, this is not a code ready for production, it's just the explanation of a possible solution to the request you made

to change the url showing and according to that change the content of the page and make the urls and the content of the page to be robot friendly

Igor S Om
  • 735
  • 3
  • 12
  • thank you so much for the detailed explanation Igor, your answer is almost the same as @Ardeshir explained before, so the only problem I can think of for your solution is that (as I commented on Ardeshir's answer) the data transfer rate of the server would be awfully high! because you have to get the rendered page from the server, so imagine a page of 100 search content and 20,000+ users requesting it, the server will explode I guess. but since you were the only one providing a solution in pure js and also going through some detail I'll accept your answer! thank you so much again – Amin Jafari Jun 04 '16 at 04:11
  • I don't know if you already have something in place for your project, but consider that solid infrastructures like Cloud and Amazon WS can offer better performance than what you expect. Depending on the kind of application you should also consider to make the right choice in terms of technologies Php vs Node.js, SQL vs MondoDB, and do not limit your researches to these. Solr and Spark are two interesting Apache solutions for data driven applications... It all depends on your needs. – Igor S Om Jun 04 '16 at 11:04
  • On the other hand also consider that you can distinguish between traffic coming from search engine crawlers and browsers and handle requests with some differences, for example using always ajax for loading content inside dynamic containers, even at first load, and load the content all in once for web crawlers, if you really feel this can impact your project – Igor S Om Jun 04 '16 at 11:08
  • "you can distinguish between traffic coming from search engine crawlers and browsers" that's exactly what I was trying to avoid! also now that it's out there, I've actually created a structure that handles all of the things I asked (also handles all the data binding at the client side so the data transfer rate comes to the minimum) and it is not dependent on any libraries, I just wanted to see if somebody has already done it before publishing it. your answer was the closest to my approach, thank you so much again ;) – Amin Jafari Jun 04 '16 at 15:56
3

If you are really care about your SEO you should not use AJAX to fill dynamically you site, this is not about Google spiders because the can read JavaScript in simpel way but for other search engine spiders.

The best and oldest approach is to use normal routes but you can simulate them with nodeJS and react so you can use Javascript to fill your content this is called Isomorphic if i have it correctly.

http://nerds.airbnb.com/isomorphic-javascript-future-web-apps/

update:

An application that can only run in the client-side cannot serve HTML to crawlers, so it will have poor SEO by default. Web crawlers function by making a request to a web server and interpreting the result; but if the server returns a blank page, it’s not of much value. There are workarounds, but not without jumping through some hoops. source Airbnb website

The difference is that the user expires the speed of client side rendering and the web crawlers get there content from the server.

J. Overmars
  • 1,653
  • 1
  • 11
  • 10
  • There is a slide difference between the dynamic call and the Isomorphic way. I can quote the complete website but maybe the best is to read it carefully. – J. Overmars Jun 01 '16 at 11:33
  • @J.Overmars thanks for the information, I'll take a look at your link, doesn't sound like a simple thing to do (which is a downside) +1 anyways... – Amin Jafari Jun 01 '16 at 12:07
  • i know there is no simpel answer for your question, and it is not really that hard to implement because you can use the same javascript but on the serverside. – J. Overmars Jun 01 '16 at 12:16
  • I'll read the article you shared and discuss it further with you later, thanks again – Amin Jafari Jun 01 '16 at 12:18
  • I read the article, there is a small problem with it where you have to use some libraries or tools like node.js (which is bad because it limits you), a simple no library solution would be the best way to go in my opinion – Amin Jafari Jun 01 '16 at 14:35
3

There is actually a way to do this. I use AngularJS and as someone has rightly pointed out Google will index your site with no issues now (was of June last year if I remember correctly). Facebook, and other such sites, don't crawl JavaScript so in order to do this you need another solution.

The easiest way is to use prerender.io as it will use the _escaped_fragment_ parameter and generate HTML snapshots of your pages. These will be generated only when the page has been requested.

Other than that, the only other real solution is to use PhantomJS and create the snapshots yourself. This is not as difficult as it might sound. You can use something like Gulp or Grunt to generate the snapshots when you change your views.

I hope that helps.

r3plica
  • 13,017
  • 23
  • 128
  • 290
2

You can use server-side rendering with tools like React and inside your application (js) simply change the URL by history.pushState().
For your first concern (changing the URL) please take a look at this example.
For your second concern (writing the code once), using a server-side rendering method will solve the problem because it renders the required elements into an HTML string, and then sends it as a response to the client.

  • but wont it be costly if you send the whole page from the server to the client? the data transfer rate would be off the roof IMHO – Amin Jafari Jun 01 '16 at 14:00
  • I assume you are talking about a large page with many element like lists that if we would receive some raw data and did templating at the client side then we would get lighter data transfer. That's true but you need a robot friendly content and there is a trade-off between getting contents readable for robots and increasing a little in page size. My point is that you don't need to write page contents twice, for example once in Razor and once in JS. – Ardeshir Valipoor Jun 01 '16 at 18:19
  • I know but I think we must think a little outside the box to try and reduce the data transfer to have the perfect package! your answer is the closest by far, there's just a little problem of the server traffic... also it would be the best if we didn't have to use libraries or tools to implement the idea, a solution that would work with all languages would be the best. – Amin Jafari Jun 02 '16 at 19:55
  • You can implement this pattern in pure JavaScript and also you are right, we should find a way around the large payloads problem. – Ardeshir Valipoor Jun 03 '16 at 03:47
0

I think the main issue your pages not getting indexed properly might be because of # in the urls.

For SEO with AngularJS, please check the following links,

Chinni
  • 1,199
  • 1
  • 14
  • 28
  • my urls don't have `#` , I think you got my question wrong and I am also not using angularjs currently – Amin Jafari Jun 02 '16 at 19:51
  • yeah, I have tried, made it work, but wasn't quiet the desired result (performance and code length not to mention the added size of the libraries) – Amin Jafari Jun 03 '16 at 05:26
  • I don't understand. Libraries are inevitable for any project. It isn't recommended to re-invent the wheel again and again. Someone has already done that for you so why not just use it? Also could you please tell what performance issues did you encounter while using AngularJS? – Chinni Jun 03 '16 at 05:59
  • in some cases (big project) you need performance, so you have to write some of the libraries and codes yourself, it is not re-inventing the wheel, it is optimizing the wheel for your needs! what we're trying to do in this project is performance first and it must be as lightweight as possible, explaining the details may bore everyone, so let's just say we can't use libraries – Amin Jafari Jun 03 '16 at 06:06