1

I have a data frame (originally from a CSV file) with the columns NAME and YEAR. I have extracted a sample from this data frame of the first ten entries like so:

sample<-df(1:10,)

I want to know the frequency of the values in the NAME column so I input the following:

as.data.frame(table(sample$NAME))

This counts the frequency in the sample correctly but also includes every name from the original data frame in the 'Var1' column (all with a Freq of 0).

The same thing happens if I use unique(sample$NAME) as well: it lists the names from the sample along with all of the names from the original data frame as well.

What am I doing wrong?

Jd S
  • 15
  • 5

1 Answers1

0

This could be a case of unused level in the 'NAME' factor column. We can use droplevels or call factor again to remove those unused levels.

as.data.frame(table(droplevels(sample$NAME)))

Or

as.data.frame(table(factor(sample$NAME)))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Lots of thanks for this. I searched further on this subject and found this earlier SO q&a which goes into additional detail: http://stackoverflow.com/questions/1195826/drop-factor-levels-in-a-subsetted-data-frame Specifically, sample<-droplevels(sample) – Jd S Oct 10 '15 at 18:02
  • I tried to immediately, but it made me wait a few minutes. Thanks again. – Jd S Oct 10 '15 at 18:05
  • @JdS Thanks, also consider to show some example dataset when you post in the future (for easier understanding). – akrun Oct 10 '15 at 18:06