1

I am using cheerio to do some simple scraping.

I want to scrape content from a website. I want to scrape it in HTML format, so I use .html() in cheerio.

const content = item.find(`div.message`).html()

And the result is a HTML source

\n\t\t\t\n\t\t\tmua về độ lại xinhan khóa thông minh đèn thấy ok mà <img src=\"/images/smilies/Off/boss.gif\" border=\"0\" alt title=\"Boss\" class=\"inlineimg\">\n\t\t

I want to remove all the \n\t\n\n\t. The regex I use is

(\\t\\n|\\n|\\t)

I work perfectly on regex101 website. It matched all the\n\t\n in the string. But when I use the replace method in Javascript, it does not work.

const content = item.find(`div.message`).html().replace(/(\\t\\n|\\n|\\t)/, "")

The result still the same string with \n\t\n\n\t.

What do I need to change in the code?

Updated more code

I create a new file to test

const string =
  '\n\t\t\t<!-- BEGIN TEMPLATE: ad_showthread_firstpost_start -->\n\n<!-- END TEMPLATE: ad_showthread_firstpost_start -->\n\t\t\ttính tết này súc em winner x , máy thím cho em ý kiến với <img src="/images/smilies/Off/pudency.gif" border="0" alt title="Pudency" class="inlineimg">\n\t\t';

console.log(string.replace(/(\\t\\n|\\n|\\t)/, ""));

The result will be the same

enter image description here

Nguyen Hoang
  • 540
  • 5
  • 25

1 Answers1

1

keep it simple:

.replace(/[\\n\\t]/g, '')

Apolo
  • 3,844
  • 1
  • 21
  • 51