7

I am trying to recode a factor variable in R and using the following code:

library(car)
napier_captureComplexity=recode(napier$a_SpatialConnectivity,"'1 - Very simple and clear:     no diagrams, single sheets'=1;'2 - Reasonably simple: some simple diagrams or second sheets'=2;'3 - Reasonably complex: multiple diagrams or sheets but can be followed'=3;'4 - Moderately complex: multiple diagrams and sheets'=4;'5 - Very complex'=5;",as.factor.result=FALSE)

And get the following error message:

Error in parse(text = range[[1]][1]) : <text>:1:1: unexpected INCOMPLETE_STRING 1: '4 - Moderately complex

With a ^ below the number 4

I'm not sure what is causing this, I had wondered about the : through the code but I am not using c() and the code executes fine on other factors in the dataset that have similar string values in them.

Any help is appreciated!

thelatemail
  • 91,185
  • 12
  • 128
  • 188
user3746990
  • 71
  • 1
  • 1
  • 2

3 Answers3

5

It's actually because of the ":" in your descriptions. This function uses some odd eval and strsplit statements to work. It ends up splitting in ":" because that's a special code in their syntax and there appears to be no way to escape that.

But i'm assuming napier$a_SpatialConnectivity is a factor with those given levels? You can recode the variable by explicitly setting the levels in the factor()call.

mylevels <- c("1 - Very simple and clear:     no diagrams, single sheets",
  "2 - Reasonably simple: some simple diagrams or second sheets", 
  "3 - Reasonably complex: multiple diagrams or sheets but can be followed", 
  "4 - Moderately complex: multiple diagrams and sheets", 
  "5 - Very complex")

napier_captureComplexity <- as.numeric(factor(napier$a_SpatialConnectivity, levels=mylevels))

That will order the levels 1:5 which just happens to be how you tried to recode them anyway.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • I followed your method, all the values became `NA`... Are you sure your method works? – JW.ZG Feb 16 '16 at 22:22
  • Yes i'm sure. If you define `napier<-data.frame(a_SpatialConnectivity=sample(mylevels, 20, replace=T))` after you define `mylevels`, the code will run and produce numeric values, not NA. – MrFlick Feb 16 '16 at 22:25
  • I think there is something wrong in your code. if you use `napier<-`, the result is *napier will only has one column*. In my case, it is `adult$salary`. According to your method, `mylevels<-c(" <=50K"," >50K") data.frame(salary=sample(mylevels, 938, replace=T))`, this code generate a data.frame with 938 row, and the inner data is still " <=50K"," >50K", not **1** and **2** and this has nothing to do with the original data in `adult$salary`. I have been working on this for more than one hour. – JW.ZG Feb 16 '16 at 23:13
  • I don't understand what you are doing. The answer I provided does not change the original data frame. If you are having troubles, perhaps you should open your own question and include a reproducible example. Trying to answer new questions in comments is difficult. – MrFlick Feb 16 '16 at 23:16
2

recode seems to interpret the : as representing a range of values, even if it is inside a string, and the : is interpreted as prematurely terminating the string. For example:

x = c("a","b","c")
recode(x, "'a'=1; 'b'=2; 'c'=3;")
[1] 1 2 3

but

x = c("a:d","b","c")
recode(x, "'a:d'=1; 'b'=2; 'c'=3;")
Error in parse(text = range[[1]][1]) : 
  <text>:1:1: unexpected INCOMPLETE_STRING
1: 'a
    ^

In every example I've tried the string terminates at the :, causing an error.

James King
  • 6,229
  • 3
  • 25
  • 40
1

Anyone who is in a similar position but working with strings instead of factors should be able to use gsub to remove the colon from the data.

napier_captureComplexityy <- gsub(":","",napier$a_SpatialConnectivity)

Omit the colon from the recode string, and it should be good to go.

bcarothers
  • 824
  • 8
  • 19