-1

I'm facing an issue when I use R to recode my data.

I have a variable which is called timing_spend, and it is a numeric variable. The data in it are continuous values. And I want to recode them into a group as the factor value.

The data sample was shown below:

timng_spend
 1
34
 2 
45
 2
 8
22
10
28
62
13
16
58
49
25
69
52
71
10
21
1
....etc

The R code I am using is shown below:

group_time=function(timing_spend){
if (timing_spend >= 0 & timing_spend <= 12){
return('0-12 Month')
}else if(timing_spend > 12 & timing_spend <= 24){
return('12-24 Month')
}else if (timing_spend > 24 & timing_spend <= 48){
return('24-48 Month')
}else if (timing_spend > 48 & timing_spend <=60){
return('48-60 Month')
}else if (timing_spend > 60){
return('> 60 Month')
}}

assignment$time_group=sapply(assignment$timing_spend,group_time)
assignment$time_group=as.factor(assignment$time_group)

When I checked my data by using str function, it shows me that "Factor w/ 5 levels "> 60 Month","0-12 Month",.." as 1, 2, 3 ... etc

And it was not what I was trying to do. I want to put ">60 Month" as "5", not "1".

Is there anyone can help me modify that? Or is this the auto mechanism of R to interpret the factor level variables? This is the plot I want to show, the tenure here was the timing i explained above, I just changed the name of it As you can see, the rank of the factor here was wired. And I want to move the "> 60 Month" to the most right side, which means it should be 5, not 1.

PS: I do not provide data sample here because I think we may not need it.

Marcus
  • 29
  • 3
  • 1
    Could you please add part of your data and what do you expected. –  Mar 24 '18 at 17:13
  • @Alice Hi Alice, my code was correct for recode. What I excepted it that the factor level variable I created was shown like this (this is the result when I using "str" function to check my data ): Factor w/ 5 levels "0-12 Month", "12-24 Month","24-48 Month", "24-48 Month" as 1, 2, 3..etc. So, it means "> 60 Month" should be 5, not 1. However, when I recoded my variables with my R code, it shows ""> 60 Month" as 1. – Marcus Mar 24 '18 at 17:15
  • 1
    Hi. If you give an example of the output that would be very helpful for me and all other people. –  Mar 24 '18 at 17:17
  • You might find it easier to use `cut` to cut a continuous variable into ordered factors by range. – Andrew Gustar Mar 24 '18 at 17:20
  • Possible duplicate of [R - Cut by Defined Interval](https://stackoverflow.com/questions/5746544/r-cut-by-defined-interval) – Eric Fail Mar 24 '18 at 17:21
  • @ Andrew Gustar Hi, can you be more specific? – Marcus Mar 24 '18 at 17:22
  • @ Alice, can you check my post again? I upload my graph, so you may understand. – Marcus Mar 24 '18 at 17:31
  • @EricFail. Again we disagree about duplication. I'm seeing this as a question about the ways to get around the default ordering of factor varaibles by the alpha sorting of their level names. The questioner wants "> 60 Months" to appear last in the str output. – IRTFM Mar 24 '18 at 17:58
  • @ EricFail Yes, that is exactly what I want to ask! Sorry I haven't find a perfect way to describe it. So, how do we solve this? – Marcus Mar 24 '18 at 18:05
  • I agree with you @42 – Marcus Mar 24 '18 at 18:08
  • @Marcus. It's generally a very poor idea to reply to requests for data by saying it's not needed. It's almost always needed. I generally pass over and downvote question where people claim "no data needed" because I think such behavior fits the downvote-able "low effort" criterion. – IRTFM Mar 24 '18 at 18:12
  • @ 42 Sorry for that, I will notice next time. – Marcus Mar 24 '18 at 18:15
  • My bad. I've redacted the close vote. – Eric Fail Mar 24 '18 at 19:09

1 Answers1

0

Instead of using if() {}else{} which is generally the wrong approach in R data management tasks, learn to use cut or findInterval. I didn't wrap this in a new function name because cut is already defined but if you wanted to make a specific, narrowly defined function to just do this partitioning, you could clearly do that.

     (group_time= cut( timng_spend, breaks=c(0, 12,24,48,60, Inf), 
                   labels= c( '0-12 Month', '12-24 Month', '24-48 Month', 
                              '48-60 Month', ">60 Months") ) )
 [1] 0-12 Month  24-48 Month 0-12 Month  24-48 Month 0-12 Month  0-12 Month 
 [7] 12-24 Month 0-12 Month  24-48 Month >60 Months  12-24 Month 12-24 Month
[13] 48-60 Month 48-60 Month 24-48 Month >60 Months  48-60 Month >60 Months 
[19] 0-12 Month  12-24 Month 0-12 Month 
Levels: 0-12 Month 12-24 Month 24-48 Month 48-60 Month >60 Months

If you do it this way, any graphs should come out correctly (to your eyes), because they will adopt the ordering of the levels attributes of factors.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • I tried your way, and from your code, it define any value which is larger than 60 as NA, and it did not show up in the list when I used "Str" function to check. And although you defined the levels, but from the data, it shows up like the interval: eg: (1,12). – Marcus Mar 24 '18 at 18:21
  • Now you see a perfect example where having data would have supported testing of code. I'll wait patiently for you to improve your question. I don't see it as the responsibility of respondents to make test cases. The `cut` function does have some potential gotcha's regarding how stuff at the ended of the breaks vector get handled. I generally use the `findInterval` function with a breaks vector that is flanked on either end with `-Inf` and `Inf`. – IRTFM Mar 24 '18 at 18:24
  • @ 42 I uploaded the variable sample. Is this what you are asking for? – Marcus Mar 24 '18 at 18:30
  • @ 42 I think I solved it, by just adding Inf in the last when I used cut function. However, another issue pop out. I want to transfer the group_time variable into the factor variable, because I need it in the following analysis. And I cannot do it if I used the cut function, do you have any idea? – Marcus Mar 24 '18 at 18:46
  • I used `scan` to read the values in (and used your spelling) , found a couple of errors (including an extraneous cut-point) but cannot find anything in your question about a group_time variable. – IRTFM Mar 24 '18 at 18:54
  • @ 42. I think I solved the issues. And it was a factor level variable. – Marcus Mar 24 '18 at 19:04