I have a .csvfile with multiple foreign languages(russian, japanese, arabic,etc) info within. For example, a column entry look like this:<U+03BA><U+03BF><U+03C5>.I want to remove rows which have this kind of info. I tried various solutions for, all of them with no result:
test_fb5 <- read_csv('test_fb_data.csv', encoding = 'UTF-8')
or applied for a column:
gsub("[<].*[>]", "")` or `sub("^\\s*<U\\+\\w+>\\s*", "")
or
gsub("\\s*<U\\+\\w+>$", "")
It seems that R 4.1.0 doesn't find the respective chars. I cannot find a way to attach a small chunk of file here. Here is the capture of the file:
address
33085 9848a 33 avenue nw t6n 1c6 edmonton ab canada alberta
33086 1075 avenue laframboise j2s 4w7 sainthyacinthe qc canada quebec
33087 <U+03BA><U+03BF><U+03C5><U+03BD><U+03BF><U+03C5>p<U+03B9>tsa 18050 spétses greece attica region
33088 390 progress ave unit 2 m1p 2z6 toronto on canada ontario
name
33085 md legals canada inc
33086 les aspirateurs jpg inc
33087 p<U+03AC>t<U+03C1>a<U+03BB><U+03B7><U+03C2>patralis
33088 wrench it up plumbing mechanical
category
33085 general practice attorneys divorce family law attorneys notaries
33086 <NA>
33087 mediterranean restaurants fish seafood restaurants
33088 plumbing services damage restoration mold remediation
phone
33085 17808512828
33086 14507781003
33087 302298072134
33088 14168005050
the 3308's are the rows of the dataset Thank you for your time!