So first of all I think you should understand what went wrong.
The http request operation is asynchronous. This means that the callback code in http.get() will run sometime in the future, but the fs.readFileSync, due to its synchronous nature will execute and complete even before the http request will actually be sent to the background thread that will execute it, since they are both invoked in what is commonly known as the (same) tick. Also fs.readFileSync returns a value and does not use a callback.
Even if you replace fs.readFileSync with fs.readFile instead the code still might not work properly since the readFile operation might execute before the http response is fully read from the socket and written to the disk.
I strongly suggest reading: stackoverflow question and/or Understanding the node.js event loop
The correct place to invoke the file read is when the response stream has finished writing to the file, which would look something like this:
var request = http.get(url, function(response) {
response.pipe(file);
file.once('finish', function () {
fs.readFile(localFile, /* fill encoding here */, function(err, data) {
// do something with the data if there is no error
});
});
});
Of course this is a very raw and not recommended way to write asynchronous code but that is another discussion altogether.
Having said that, if you download a file, write it to the disk and then read it all back again to the memory for manipulation, you might as well forgo the file part and just read the response into a string right away. Your code will then look something like so (this can be implemented in several ways):
var request = http.get(url, function(response) {
var data = '';
function read() {
var chunk;
while ( chunk = response.read() ) {
data += chunk;
}
}
response.on('readable', read);
response.on('end', function () {
console.log('[%s]', data);
});
});
What you really should do IMO is to create a transform stream that will strip away all the data you need from the response, while not consuming too much memory and yielding this more elegantly looking code:
var request = http.get(url, function(response) {
response.pipe(yourTransformStream).pipe(file)
});
Implementing this transform stream, however, might prove slightly more complex. So if you're a node beginner and you don't plan on downloading big files or lots of small files than maybe loading the whole thing into memory and doing string manipulations on it might be simpler.
For further information about transformation streams:
Lastly, see if you can use any of the million node.js crawlers already out there :-) take a look at these search results on npm