R large data frame with factors won't shrink when subset

Question

I have a large-ish data frame (100k Row x 50 Col) with several factor variables. I want a small subset (like 100 rows) to do some prototyping with. The problem is when I type :

train <- train[1:100,]

the size shrinks (using dim()) but it still appears to store all the factors from the original data frame (I'm measuring memory size using lsos() found here).

Is there a way to get around this? So far the only way I've found is to turn the factor variables to character strings then subset, then convert to factors again. I feel like there has to be a better way to do this.

Any suggestions?

score 4 · Accepted Answer · answered Feb 19 '13 at 18:46

Use droplevels function to get rid of the levels that are not in the new data.frame, see ?droplevels for more info.

Example:

> DF <- data.frame(num=1:15, letter=rep(letters[1:5], each=3),random=rnorm(15))
> levels(DF[, 2]) # all levels
[1] "a" "b" "c" "d" "e"
> 
> DF2 <- DF[1:10, ] # subseting
> levels(DF2[, 2]) # all levels again
[1] "a" "b" "c" "d" "e"
> DF2[, 2] <- droplevels(DF2[, 2])
> levels(DF2[, 2]) # only the levels contained in DF2
[1] "a" "b" "c" "d"

+1, great, I didn't know about `droplevels`. – juba Feb 19 '13 at 18:49 — juba, Feb 19 '13 at 18:49

R large data frame with factors won't shrink when subset

1 Answers1