-1

I am trying to make a new column that categorises one column in my data in to BMI categories, which I can then rbind in to a new, complete data frame. However, my current method only produces a numeric vector. As a result, this won't seem to bind back to my original data set.

Here is my code:

BMI_cut <- cut(alldata$BMI, 
               breaks = c(-Inf, 18.5, 25.0, 30.0, Inf),
               labels = c("<18.50", "18.50-24.99", "25.00-29.99", ">=30.00"),        
               right = FALSE)

BMIbind <- rbind(sorteddata, BMI_cut)

When trying this, I get the error: Warning messages: 1: In [<-.factor(*tmp*, ri, value = 2L) : invalid factor level, NA generated 2: In [<-.factor(*tmp*, ri, value = 2L) : invalid factor level, NA generated

And the result is a bind with the original data and no BMI category column. The only difference is a new row with values of <NA>, 2 and 3. I can't make sense of this.

I am a complete beginner to R. Additionally, whilst there are some packages that look like they can do this much easier, I cannot use them as this is for an assignment. Any help would be greatly appreciated.

Mr Lister
  • 45,515
  • 15
  • 108
  • 150
Fuz
  • 11
  • could you provide some of `alldata$BMI` using `dput` ? Otherwise your problem can hardly [be reproduced](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610). – Vincent Bonhomme Apr 02 '16 at 05:43
  • For the cut command you use the data.frame `alldata`, for the rbind one you use `sorteddata`. Any chance those two are not identical? – Thilo Apr 02 '16 at 07:03

1 Answers1

0

I see quite a few problems. For once, as already mentioned in the comment above, you use tow different data.frames.

Furthermore, you seem to mix up cbind and rbind. rbind concats two identical data.frames, while cbind concats different columns.

Finally, this could be done by just adding another column to your data.frame like this:

alldata$BMI_cut <- cut(alldata$BMI, 
                       breaks = c(-Inf, 18.5, 25.0, 30.0, Inf),
                       labels = c("<18.50", "18.50-24.99", "25.00-29.99", ">=30.00"),        
                       right = FALSE)

For rbind and cbind compare the following:

> rbind(data.frame(x=1:5), data.frame(x=6:10))
    x
1   1
2   2
3   3
4   4
5   5
6   6
7   7
8   8
9   9
10 10
> cbind(data.frame(x=1:5), data.frame(x=6:10))
  x  x
1 1  6
2 2  7
3 3  8
4 4  9
5 5 10
Thilo
  • 8,827
  • 2
  • 35
  • 56