I am trying to find a way using Powershell Script to do the following.
- For each line in text file, check if line contains non-ASCII characters
- If line contains non-ASCII characters, output to separate file
- If line does not contain non-ASCII characters, skip to next line
By non-ASCII characters, I'm referring to non keyboard characters, e.g. accented characters, characters from another language, etc.
Sample Data
- 张伟
- குழந்தைகளுக்கான பெயர்கள்
- 日本人の氏名
- Full Name
- Léna Rémi
Output Data
- 张伟
- குழந்தைகளுக்கான பெயர்கள்
- 日本人の氏名
- Léna Rémi
I found the regex in other threads to remove non-ASCII characters but I couldn't seem to make it work.
Please help!
** EDIT ** Thanks everyone for the help! I have managed to do what I wanted with the below script.
$nonASCII = "[^\x00-\x7F]"
foreach ($_ in [System.IO.File]::ReadLines($source)){
if ($_ -cmatch $nonASCII){
write-output $_ | out-File $output -append
}
}