1

When I run:

unzip -p /tmp/document.docx word/document.xml | sed -e 's/<\/w:p>/\\n/g; s/<[^>]\{1,\}>//g; s/[^[:print:]\n]\{1,\}//g'

It correctly extracts the text from my .docx file.

But when I try to wrap this in a Node.js program as follows:

const spawn = require("child_process").spawn;
const command = "unzip"; ;
const child = spawn("sh", ["-c", "unzip -p /tmp/document.docx word/document.xml | sed -e 's/<\/w:p>/\\n/g; s/<[^>]\{1,\}>//g; s/[^[:print:]\n]\{1,\}//g'"]);


const stdout = child.stdout;
const stderr = child.stderr;
const output = "";

stderr.on("data", function(data) {
    console.error("error on stderr", data.toString());
});


stdout.on("data", function(data) {
    output += data;
 });

stdout.on("close", function(code) {

 });

I get the following error message:

error on stderr sed: -e expression #1, char 10: unknown option to `s'

How do I fix this error?

Hoa
  • 19,858
  • 28
  • 78
  • 107

1 Answers1

1

When using a command line that way in your code, you have to think about the interpretation of the \ made by node.js and antislash the antislash. One for the node.js one for the sed command.

spawn("sh", ["-c", "unzip -p /tmp/document.docx word/document.xml | sed -e 's/<\\/w:p>/\\\\n/g; s/<[^>]\\{1,\\}>//g; s/[^[:print:]\\n]\\{1,\\}//g'"])

Look at here

@T.J Crowder

In JavaScript, the backslash has special meaning both in string literals and in regular expressions. If you want an actual backslash in the string or regex, you have to write two: \.

Orelsanpls
  • 22,456
  • 6
  • 42
  • 69
  • 1
    An alternative to escaping the string yourself is using the built-in function [String.raw](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/raw) with a template literal string. – RickN Mar 06 '19 at 11:19
  • Yup it's an alternative you can find in the original post made by T.J Crowder. I invite you to go see it and upvote if you find it useful. – Orelsanpls Mar 06 '19 at 12:20