0

I have a dataset that looks like this:

Observation Age Family members Income
1 25 2 3k
2 29 4 1k
3 32 Six 3k
3 32 Five 5k

I'm a STATA user, so I've been stuck in this problem for a while. How can i convert the variable Family members into a numeric variable, since it has both numeric and character values.

  • 1
    To parse numbers written in english, please see [Convert written number to number in R](https://stackoverflow.com/questions/18332463/convert-written-number-to-number-in-r) – Henrik Aug 03 '21 at 17:35
  • argh... next time I search first before writing a similar function... – Martin Gal Aug 03 '21 at 17:48

2 Answers2

1

We can use as.numeric and there will be a warning that shows non-numeric elements are converted to NA

library(english)
library(dplyr)
df1$Family_members <- with(df1, as.numeric(coalesce(as.character(match(toupper(Family_members), 
         toupper(as.english(1:9)))), Family_members)))

-output

 df1
  Observation Age Family_members Income
1           1  25              2     3k
2           2  29              4     1k
3           3  32              6     3k
4           3  32              5     5k

data

df1 <- structure(list(Observation = c(1L, 2L, 3L, 3L), Age = c(25L, 
29L, 32L, 32L), Family_members = c("2", "4", "Six", "Five"), 
    Income = c("3k", "1k", "3k", "5k")), class = "data.frame", row.names = c(NA, 
-4L))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Assuming a family doesn't have more then 19 members, you could use a custom function like

word2num <- function(word) {
  word <- tolower(word)
  
  output <- match(word, 
                  c("one", "two", "three", "four", "five", "six", "seven", 
                    "eight", "nine", "ten", "eleven", "twelve", "thirteen", 
                    "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen")
                  )
  
  if (is.na(output)) {
    return(as.numeric(word))
  }
  
  output
}

and apply it to your data:

df$Family_members <- sapply(df$Family_members, word2num)

This returns

  Observation Age Family_members Income
1           1  25              2     3k
2           2  29              4     1k
3           3  32              6     3k
4           3  32              5     5k
Martin Gal
  • 16,640
  • 5
  • 21
  • 39