2

I have a variable in the dataset contains three types of values: text(string), numeric, and missing values. All of them are stored as a factor now. I want to distinguish the text content from the numeric values and the missing values. How could I get it?

Data <- data.frame(x=c("100","20","home","","30"))

there are three type of values here, number, text, and missing values, I want to find the locations of all text

Xin Chang
  • 87
  • 1
  • 5

1 Answers1

1

You can extract text, numeric and missing indices separately with regex:

grep("[:alpha:]+", Data$x)
# [1] 3

grep("[0-9]+", Data$x)
# [1] 1 2 5

grep("^\\s*$", Data$x)
# [1] 4

To get the actual values, use value=TRUE:

grep("[:alpha:]+", Data$x, value = TRUE)
# [1] "home"

grep("[0-9]+", Data$x, value = TRUE)
# [1] "100" "20"  "30"

grep("^\\s*$", Data$x, value = TRUE)
# [1] ""

[:alpha:]+ matches any alphabet one or more times

[0-9]+ matches any numbers one or more times

^ matches start of string, $ matches end of string, and \\s* matches spaces zero or more times, so ^\\s*$ matches only spaces zero or more times.

acylam
  • 18,231
  • 5
  • 36
  • 45