4

While requesting HTTP responses with Node.js and importing these into MongoDB, I noticed one or two URLs will have headers that contain illegal characters (since they are being used keys) which will crash the entire script as I try to import into MongoDB. An example is below:

{
  "url": "divensurf.com",
  "statusCode": 200,
  "headers": {
    "x-varnish": "2236710953 2236710300",
    "vary": "Accept-Encoding,Cookie,X-UA-Device",
    "cache-control": "max-age=7200, must-revalidate",
    "x-cache": "V1HIT 2",
    "content-type": "text/html; charset=UTF-8",
    "page.ly": "v4.0",
    "x-pingback": "http://divensurf.com/xmlrpc.php",
    "date": "Thu, 21 Mar 2013 19:40:59 GMT",
    "transfer-encoding": "chunked",
    "via": "1.1 varnish",
    "connection": "keep-alive",
    "last-modified": "Thu, 21 Mar 2013 19:40:57 GMT",
    "age": "2"
  }
}

The header/key "page.ly" would crash the script, since it contains an illegal character .. Are there any ways to sanitize this key/header which is enclosed in a quote by removing these illegal characters before I import this document into MongoDB?

Below is the code in which I request responses:

(function (i){
            http.get(options, function(res) {

                var obj = {};
                obj.url = hostNames[i];
                obj.statusCode = res.statusCode;
                obj.headers = res.headers;

                db.scrape.save(obj); // imports headers into MongoDB

            }).on('error',function(e){
        console.log("Error: " + hostNames[i] + "\n" + e.stack); // prints error stack onto console
        })
    })(i);

For example, it would be from "page.ly" to "pagely"

EDIT: SOLVED. Check Gael's answer.

theGreenCabbage
  • 5,197
  • 19
  • 79
  • 169

1 Answers1

1
obj.headers={}; 
for(var item in res.headers){ 
    obj.headers[ item.replace(/\./,'')] = res.headers[item]; 
}
Gaël Barbin
  • 3,769
  • 3
  • 25
  • 52
  • I will as soon as it allows me. – theGreenCabbage Mar 21 '13 at 20:07
  • Apparently .replace() does not work - it would give me the error `Object # has no method 'replace'`. Any suggestions? I think it is the `item` in `res.headers` that needs to be regular-expressed, instead of `header`. I could do something like for(var item in res.headers){item.replace(/\./gi,'');}. That's the concept, at least, since I still need to wrap it in obj.headers to parse into MongoDB. – theGreenCabbage Mar 21 '13 at 20:23
  • 1
    see: http://docs.mongodb.org/manual/reference/limits/#Restrictions%20on%20Field%20Names and http://stackoverflow.com/questions/12397118/mongodb-dot-in-key-name – Gaël Barbin Mar 21 '13 at 20:23
  • 1
    You can try: `res.headers.toString().replace(...` – Gaël Barbin Mar 21 '13 at 20:24
  • The output for that in headers is `"headers": "[object Object]",`. I think we can solve this using `for(var item in res.headers){...}` – theGreenCabbage Mar 21 '13 at 20:30
  • 1
    yes, you have to make a `.replace()` just on the key. But what was working when you wrote your first comment?? – Gaël Barbin Mar 21 '13 at 20:33
  • 1
    `for(var item in res.headers){res.headers[i].key.replace(...);}` – Gaël Barbin Mar 21 '13 at 20:33
  • Am I supposed to replace `key` with something, so it's assigned to something? – theGreenCabbage Mar 21 '13 at 20:40
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/26674/discussion-between-thegreencabbage-and-gael) – theGreenCabbage Mar 21 '13 at 20:42