1

Let's say I have a data frame stored in justData

I am trying to remove all typeable characters including spaces and tabs from one of the columns in justData except emojis.

I tried using gsub() function but I quickly realized there was no way to identify all the unicode ranges for each type of character present in the data.

justData <- apply(justData, 2, function(x) gsub("[0-9]", "", x)) # Digits
justData <- apply(justData, 2, function(x) gsub("[[:punct:]]", "", x)) # Punctuations
justData <- apply(justData, 2, function(x) gsub("[A-z]", "", x)) # Standard Latin Alphabet
justData <- apply(justData, 2, function(x) gsub("[\U4E00-\U9FFF\U3000-\U303F]", "", x)) # Common Chinese

Is there a R function where I can remove everything from this specific column except emojis?

Kosu K.
  • 75
  • 5
  • Relevant though probably still slightly inaccurate as the list keeps changing - https://stackoverflow.com/questions/30470079/emoji-value-range – thelatemail Nov 12 '20 at 04:55

0 Answers0