I am trying to replace all special characters using Regex and comparing between JavaScript (node.js v10.16.3) and Python (3.7.x)
\t kickref, first really multi-level referral program on the сrypto market, has reached over 20 000 users just in 2 days after its start on september 28.
Splitting the sentence into characters just to see the ASCII codes gives me this character array
'["\\t"," ","k","i","c","k","r","e","f",","," ","f","i","r","s","t"," ","r","e","a","l","l","y"," ","m","u","l","t","i","-","l","e","v","e","l"," ","r","e","f","e","r","r","a","l"," ","p","r","o","g","r","a","m"," ","o","n"," ","t","h","e"," ","с","r","y","p","t","o"," ","m","a","r","k","e","t",","," ","h","a","s"," ","r","e","a","c","h","e","d"," ","o","v","e","r"," ","2","0"," ","0","0","0"," ","u","s","e","r","s"," ","j","u","s","t"," ","i","n"," ","2"," ","d","a","y","s"," ","a","f","t","e","r"," ","i","t","s"," ","s","t","a","r","t"," ","o","n"," ","s","e","p","t","e","m","b","e","r"," ","2","8","."]'
This would be the ASCII codes for each letter
'[9,32,107,105,99,107,114,101,102,44,32,102,105,114,115,116,32,114,101,97,108,108,121,32,109,117,108,116,105,45,108,101,118,101,108,32,114,101,102,101,114,114,97,108,32,112,114,111,103,114,97,109,32,111,110,32,116,104,101,32,1089,114,121,112,116,111,32,109,97,114,107,101,116,44,32,104,97,115,32,114,101,97,99,104,101,100,32,111,118,101,114,32,50,48,32,48,48,48,32,117,115,101,114,115,32,106,117,115,116,32,105,110,32,50,32,100,97,121,115,32,97,102,116,101,114,32,105,116,115,32,115,116,97,114,116,32,111,110,32,115,101,112,116,101,109,98,101,114,32,50,56,46]'
The particularly important problem is due to the letter 'c' in the word crypto. Notice its ASCII code is 1089 in the Array
In JS my code to replace the regex looks as follows
const regexSpecialCharacters = new RegExp(/\W/, 'g');
text.replace(regexSpecialCharacters, ' ');
This yields the following sentence
kickref first really multi level referral program on the rypto market has reached over 20 000 users just in 2 days after its start on september 28
The letter c got removed In Python, my regex to do the exact same thing looks like this
import re
regex_special_characters = re.compile(r'\W')
regex_special_characters.sub(' ', text)
This gives me the following output
kickref first really multi level referral program on the сrypto market has reached over 20 000 users just in 2 days after its start on september 28
The letter c here has NOT been removed in Python Can anyone kindly tell me why, I dont want JS removing the letter c either, what do I do?