Possible Duplicate:
dropping factor levels in a subsetted data frame in R
I am getting a little frustrated with R here, it would be great if anyone could help me with the following:I am trying to pull a subset out of my dataset but it does not work properly.
Specifics: I have a spreadsheet with words and different features associated with each word e.g. word article length ... ... Now I am trying to look at individual words, e.g. pull out all instances where the word is "hairbrush". To do so, I tried:
hairbrush=subset(dataset, word=="hairbrush")
This seems to work fine and gives me the right dataset when I look at it with fix
or head
. However, as soon as I try to do things like xtabs
or any kind of computation, I do not get very far because all the other words are still "there" and mess up my stats. E.g. when I do levels
, it gives me "hairbrush", but also all other 200 words. All the data pertaining to these "hidden words" is NA
but it still messes up my stats.
Is that the usual behavior of subset
? Or am I doing something wrong? Or is this the wrong approach?
Oh, and in some similar questions on Google, people always asked for the output of str
, so here it is:
> str(hairbrush)
'data.frame': 41 obs. of 10 variables:
$ id : Factor w/ 1352 levels "1-1-1-11-a.eaf",..: 210 240 267 295 320 351 378 403 427 452 ...
$ speaker : num 24 25 26 28 29 30 32 33 34 35 ...
$ loc : Factor w/ 2 levels "nb","xx": 1 1 1 1 1 1 1 1 1 1 ...
$ gilbertno: Factor w/ 27 levels "1","10","108",..: 1 1 1 1 1 1 1 1 1 1 ...
$ tword : Factor w/ 65 levels "abaddream","afuneral",..: 4 4 4 4 4 4 4 4 4 4 ...
$ word : Factor w/ 228 levels "abbe","aepfel",..: 164 93 99 93 92 100 94 94 28 93 ...
$ loan : Factor w/ 5 levels "FILE","maybe",..: 4 3 5 3 5 5 3 3 3 3 ...
$ article : Factor w/ 40 levels "a","das","dat",..: 34 34 33 33 34 34 34 34 13 34 ...
$ gender : Factor w/ 13 levels "a","af","amn",..: 11 11 7 7 11 11 11 11 7 11 ...
$ comment : Factor w/ 4 levels "0","die macht ja vorschlaege",..: 1 1 1 1 1 1 1 1 1 1 ...