I am using cheerio
to do some simple scraping.
I want to scrape content from a website. I want to scrape it in HTML format, so I use .html()
in cheerio
.
const content = item.find(`div.message`).html()
And the result is a HTML source
\n\t\t\t\n\t\t\tmua về độ lại xinhan khóa thông minh đèn thấy ok mà <img src=\"/images/smilies/Off/boss.gif\" border=\"0\" alt title=\"Boss\" class=\"inlineimg\">\n\t\t
I want to remove all the \n\t\n\n\t. The regex I use is
(\\t\\n|\\n|\\t)
I work perfectly on regex101 website. It matched all the\n\t\n in the string. But when I use the replace
method in Javascript, it does not work.
const content = item.find(`div.message`).html().replace(/(\\t\\n|\\n|\\t)/, "")
The result still the same string with \n\t\n\n\t.
What do I need to change in the code?
Updated more code
I create a new file to test
const string =
'\n\t\t\t<!-- BEGIN TEMPLATE: ad_showthread_firstpost_start -->\n\n<!-- END TEMPLATE: ad_showthread_firstpost_start -->\n\t\t\ttính tết này súc em winner x , máy thím cho em ý kiến với <img src="/images/smilies/Off/pudency.gif" border="0" alt title="Pudency" class="inlineimg">\n\t\t';
console.log(string.replace(/(\\t\\n|\\n|\\t)/, ""));
The result will be the same