20

I've got a factor with many different values. If you execute summary(factor) the output is a list of the different values and their frequency. Like so:

A B C D
3 3 1 5

I'd like to make a histogram of the frequency values, i.e. X-axis contains the different frequencies that occur, Y-axis the number of factors that have this particular frequency. What's the best way to accomplish something like that?

edit: thanks to the answer below I figured out that what I can do is get the factor of the frequencies out of the table, get that in a table and then graph that as well, which would look like (if f is the factor):

plot(factor(table(f)))
Ellie Kesselman
  • 899
  • 1
  • 17
  • 34
wds
  • 31,873
  • 11
  • 59
  • 84

1 Answers1

27

Update in light of clarified Q

set.seed(1)
dat2 <- data.frame(fac = factor(sample(LETTERS, 100, replace = TRUE)))
hist(table(dat2), xlab = "Frequency of Level Occurrence", main = "")

gives:

histogram of frequency of occurrence in factor

Here we just apply hist() directly to the result of table(dat). table(dat) provides the frequencies per level of the factor and hist() produces the histogram of these data.


Original

There are several possibilities. Your data:

dat <- data.frame(fac = rep(LETTERS[1:4], times = c(3,3,1,5)))

Here are three, from column one, top to bottom:

  • The default plot methods for class "table", plots the data and histogram-like bars
  • A bar plot - which is probably what you meant by histogram. Notice the low ink-to-information ratio here
  • A dot plot or dot chart; shows the same info as the other plots but uses far less ink per unit information. Preferred.

Code to produce them:

layout(matrix(1:4, ncol = 2))
plot(table(dat), main = "plot method for class \"table\"")
barplot(table(dat), main = "barplot")
tab <- as.numeric(table(dat))
names(tab) <- names(table(dat))
dotchart(tab, main = "dotchart or dotplot")
## or just this
## dotchart(table(dat))
## and ignore the warning
layout(1)

this produces:

one dimensional plots

If you just have your data in variable factor (bad name choice by the way) then table(factor) can be used rather than table(dat) or table(dat$fac) in my code examples.

For completeness, package lattice is more flexible when it comes to producing the dot plot as we can get the orientation you want:

require(lattice)
with(dat, dotplot(fac, horizontal = FALSE))

giving:

Lattice dotplot version

And a ggplot2 version:

require(ggplot2)
p <- ggplot(data.frame(Freq = tab, fac = names(tab)), aes(fac, Freq)) + 
    geom_point()
p

giving:

ggplot2 version

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • 3
    For bonus points, you can reorder the factor levels from smallest to largest. `fac_levels <- levels(dat$fac); o <- order(table(dat$fac)); dat$fac <- with(dat, factor(fac, levels = fac_levels[o]))`. – Richie Cotton Apr 27 '11 at 13:15
  • I probably wasn't clear enough in my question. I know how to do this. What I want to do is count how many factors have a frequency of 1, how many have a frequency of 2, 3, ... and then plot that on a barchart (basically, this is a histogram if you bin it). Perhaps the fact that it is in a factor is not ideal but that's how it came out of `read.csv`. So what I want is a chart showing the frequencies of the frequencies. – wds Apr 27 '11 at 14:30
  • @wds Is that more like what you want? – Gavin Simpson Apr 27 '11 at 14:41
  • that is awesome thanks. I'd like to actually turn this into a barchart, maybe with a logarithmic y axis to highlight outliers but I guess that is a different question entirely. – wds Apr 27 '11 at 14:51
  • And the original answer was really helpful in analysing it further, thanks again. – wds Apr 27 '11 at 15:43