0

I am having a problem submitting a UTF8 encoded text file in a POST request using NodeJS "shred"

The text content I am trying to post looks fine on the client side, I know because I console.log it to the screen just before calling client.post, what the server gets is the content of the text file but the last 2 chars are always missing/chopped. This is not a problem with ANSI text files. If I convert the textfile from UTF8 to ANSI, it is complete when it reaches the server.

var Shred = require('shred');
var client = new Shred();
var textToPost = fs.readFileSync("myfile.txt", 'utf8');
console.log (textToPost);
client.post({
     url: "http://www.example.com/readTextFile.php",
     headers: { 'Content-Type': 'application/x-subrip'},
content: textToPost,
on: {
  200: function (response) {
    console.log("posted ok");
console.log(response.content.body);
  },
  500: function (response) {
    asyncCb(new Error('bad response\n' + response.content.body));
  }
}

What is recieved on the server (by readTextFile.php) is the contents of myfile.txt with the last 2 chars stripped out. I cannot understand why. This has big downstream implications so any patchy workarounds are not likely to help.

I also noticed that when the contents of textToPost are logged to the console, there is a "?" preceding the contents. This doesn't appear when the file is an ANSI encoded file.

Please help.. thank you

Tommy
  • 176
  • 1
  • 2
  • 17
  • the content of textToPost having a `'?'` as the first char is a bad sign, and probably means that there is nothing wrong with shred, but rather with your input file. – rdrey Aug 29 '12 at 14:08
  • This sure smells like your UTF8 has a byte-order-mark that's messing up the size. http://stackoverflow.com/questions/2223882/whats-different-between-utf-8-and-utf-8-without-bom – JohnnyHK Aug 29 '12 at 14:38
  • Thanks for these comments, much appreciated. This is happening for a large number of UTF8 text files which I am inputting (not just one) and yes, these files all have a BOM which I can see when I open them in a binary editor as EF BB BF (i.e. UTF8). Not sure how this causes an issue when posting it – Tommy Aug 30 '12 at 07:45

1 Answers1

0

Ok, after the comments above (thanks rdrey and JohnnyHK), I decided to try stripping out the BOM from the file(s). So I used a hex editor, deleted the EF BB BF chars and saved, this time the file arrived at the server fully intact with no chars missing at the end. Now I'll modify my nodeJS to strip the chars out also. This doesn't completely answer my question (why is there an issue with the BOM). Perhaps shred has a problem posting text files with a BOM. Maybe it incorrectly reads it and decides the file is smaller than it actually is and as a result, chops the end off. I am not sure.

Tommy
  • 176
  • 1
  • 2
  • 17