2

I have a large-ish data frame (100k Row x 50 Col) with several factor variables. I want a small subset (like 100 rows) to do some prototyping with. The problem is when I type :

train <- train[1:100,]

the size shrinks (using dim()) but it still appears to store all the factors from the original data frame (I'm measuring memory size using lsos() found here).

Is there a way to get around this? So far the only way I've found is to turn the factor variables to character strings then subset, then convert to factors again. I feel like there has to be a better way to do this.

Any suggestions?

Community
  • 1
  • 1
screechOwl
  • 27,310
  • 61
  • 158
  • 267

1 Answers1

4

Use droplevels function to get rid of the levels that are not in the new data.frame, see ?droplevels for more info.

Example:

> DF <- data.frame(num=1:15, letter=rep(letters[1:5], each=3),random=rnorm(15))
> levels(DF[, 2]) # all levels
[1] "a" "b" "c" "d" "e"
> 
> DF2 <- DF[1:10, ] # subseting
> levels(DF2[, 2]) # all levels again
[1] "a" "b" "c" "d" "e"
> DF2[, 2] <- droplevels(DF2[, 2])
> levels(DF2[, 2]) # only the levels contained in DF2
[1] "a" "b" "c" "d"
Jilber Urbina
  • 58,147
  • 10
  • 114
  • 138