0

I would like to format a variable in R, using round, floor or ceiling. However, I would like to sometimes use floor, sometimes ceiling for different values of the same variable. Is that possible?

My dataframe is data and the variable I want to format is var. These are its values (with frequencies):

Value    |      Freq.
---------|-----------
1        |       1504
1.333333 |        397
1.5      |          9
1.666667 |        612
2        |       2096
2.333333 |       1057
2.5      |         18
2.666667 |       1270
3        |       2913
3.333333 |       1487
3.5      |         35
3.666667 |       1374
4        |       2007
4.333333 |        779
4.5      |         16
4.666667 |        522
5        |       1913
NaN      |        553

My desired result is a variable var2 that looks like this:

Value |      Freq.
------|-----------
1     |       1910
2     |       3783
3     |       5670
4     |       4195
5     |       2451     
NaN   |        553

So, 1.5 and 2.5 are adjusted downward (floor), but 3.5 and 4.5 are adjusted upward (ceiling). The other values are rounded the usual way.

My attempt is this, but it does not work yet:

data$var2 <- format(round(data$var, 1))
if (data$var2 == 1.7||2.7||3.5||3.7||4.5||4.7) {
  data$var2 <- format(ceiling(data$var2))
} else {
  data$var2 <- format(floor(data$var2))
}

I know that there are probably several mistakes in my attempt and would appreciate any help.

PS: What I'm actually looking for is an equivalent for Stata's function egen cut. With that it is very easy to achieve the desired result:

egen var2 = cut(var), at(1, 1.6, 2.6, 3.5, 4.4, 5.1)
recode var2 (1 = 1) (1.6 = 2) (2.6 = 3) (3.5 = 4) (4.4 = 5)
Alina D
  • 27
  • 4

2 Answers2

2

You can use the case_when function from the dplyr package for this:

library(dplyr)

data %>% 
  mutate(var2 = case_when(var %in% c(1.5, 2.5) ~ floor(var),
                          var %in% c(3.5, 4.5) ~ ceiling(var),
                          TRUE ~ round(var)))

This returns the following data.frame:

        var var2
1  1.000000    1
2  1.333333    1
3  1.500000    1
4  1.666667    2
5  2.000000    2
6  2.333333    2
7  2.500000    2
8  2.666667    3
9  3.000000    3
10 3.333333    3
11 3.500000    4
12 3.666667    4
13 4.000000    4
14 4.333333    4
15 4.500000    5
16 4.666667    5
17 5.000000    5
18      NaN  NaN

You can customize the conditions as needed.

henhesu
  • 756
  • 4
  • 9
1

EDIT: this answer is wrong!

I am not sure this is the desired outcome. It seems to me that you want to round to the nearest integer, except for the values 1.5 and 2.5. A decimal of 0.5 is rounded up by default (this is not true! Round up from .5).

custom.rund <- function(x){
  if(x %in% c(1.5, 2.5)){
    floor(x)
  } else {
    round(x)
  }
}

sapply( c(1.5, 2.5, 3.5, 2, 4.6), custom.rund )
[1] 1 2 4 2 5
desval
  • 2,345
  • 2
  • 16
  • 23
  • I used `sapply(data$var2, custom.rund)` and it returned `Error in round(x) : non-numeric argument to mathematical function` – Alina D May 13 '20 at 13:29
  • as the errors says, most likely ````data$var2```` is not numeric. You can check that using `class(data$var2)` – desval May 13 '20 at 13:37
  • Try to input `4.5` to your function. The output is `4`, not `5`. – Darren Tsai May 13 '20 at 13:57
  • Your code is equivalent to `ifelse(x %in% c(1.5, 2.5), floor(x), round(x))`, but more time-consuming. Just an advice, no offense. – Darren Tsai May 13 '20 at 14:06
  • 1
    @DarrenTsai Thanks a lot! Always willing to learn. I didnt not know about the IEC 60559 standard. – desval May 13 '20 at 14:09