0

Within a dataframe (call it data), I have a variable (call it var) that takes on values such as "John Smith", "Adam Olson", "Peter Bradley", etc.

sapply(data, mode) indicates var is numeric. And when I do as.numeric(var), R gives out numbers (1, 2, 3,... as expected).

When I split the variable into two by using stringr::str_split_fixed (as explained in http://rbyexamples.blogspot.com/2015/07/r-for-stata-users-part-3.html Task #14), and I call the variables firstname and lastname, R tells me that the variables are character. Hence, I can't use as.numeric

If I read How to convert a data frame column to numeric type? correctly, transform won't work. Thus, given the way I've split var, there's no way to convert the variable into a numeric.

Is there a way of splitting the variable such that it can be converted into numeric more easily?

Community
  • 1
  • 1
wwl
  • 2,025
  • 2
  • 30
  • 51
  • Not able to understand what do you want to do. You start with a `factor` column, then you use `strsplit` to get character vars. Then you want to turn character vars into numeric ?!! – Frash Jul 03 '15 at 06:35
  • 3
    How about if you provide a small reproducible example? – Roman Luštrik Jul 03 '15 at 06:42

1 Answers1

1

I splitted "var" into "firstname" and "lastname" as follows:

df <- data.frame( var = c("Adam Olson", "John Smith", "Peter Olson"))
dfFirst <- df
dfLast  <- df
colnames(dfFirst) <- "firstname"
colnames(dfLast)  <- "lastname"

L <- levels(df$"var")

for (n in (1:length(L)))
{
  i <- which(strsplit(L[[n]],"")[[1]]==" ")
  levels(dfFirst$"firstname")[n] <- substr(L[[n]],1,i[1]-1)
  levels(dfLast$"lastname")[n]   <- substr(L[[n]],i[length(i)]+1,nchar(L[[n]]))
}

dfFirstLast <- cbind(dfFirst,dfLast)

This is very unesthetic, but the variables remain numeric:

> as.numeric(dfFirstLast$"firstname")
[1] 1 2 3
> as.numeric(dfFirstLast$"lastname")
[1] 1 2 1
> as.character(dfFirstLast$"firstname")
[1] "Adam"  "John"  "Peter
> as.character(dfFirstLast$"lastname")
[1] "Olson" "Smith" "Olson"
> as.numeric(dfFirstLast$"firstname") + 8
[1]  9 10 11
> as.numeric(dfFirstLast$"lastname") / 7
[1] 0.1428571 0.2857143 0.1428571
mra68
  • 2,960
  • 1
  • 10
  • 17