Extracted data frame selection still contains entries from full data frame set

Question

I have a data frame (originally from a CSV file) with the columns NAME and YEAR. I have extracted a sample from this data frame of the first ten entries like so:

sample<-df(1:10,)

I want to know the frequency of the values in the NAME column so I input the following:

as.data.frame(table(sample$NAME))

This counts the frequency in the sample correctly but also includes every name from the original data frame in the 'Var1' column (all with a Freq of 0).

The same thing happens if I use unique(sample$NAME) as well: it lists the names from the sample along with all of the names from the original data frame as well.

What am I doing wrong?

score 0 · Accepted Answer · answered Oct 10 '15 at 17:51

0

This could be a case of unused level in the 'NAME' factor column. We can use droplevels or call factor again to remove those unused levels.

as.data.frame(table(droplevels(sample$NAME)))

Or

as.data.frame(table(factor(sample$NAME)))

answered Oct 10 '15 at 17:51

akrun

874,273
37
540
662

1

Lots of thanks for this. I searched further on this subject and found this earlier SO q&a which goes into additional detail: http://stackoverflow.com/questions/1195826/drop-factor-levels-in-a-subsetted-data-frame Specifically, sample<-droplevels(sample) – Jd S Oct 10 '15 at 18:02
I tried to immediately, but it made me wait a few minutes. Thanks again. – Jd S Oct 10 '15 at 18:05
@JdS Thanks, also consider to show some example dataset when you post in the future (for easier understanding). – akrun Oct 10 '15 at 18:06

Extracted data frame selection still contains entries from full data frame set

1 Answers1