0

Imagine you have a data frame with 2 variables - Name & Age. Name is of class factor and Age number. Now imagine now there are thousands of people in this data frame. How do you:

  1. Produce a table with: NAME | COUNT(NAME) for each name uniquely?

  2. Produce a histogram where you can change the minimum number of occurrences to show up in the histogram.?

For part 2, I want to be able to test different minimum frequency values and see how the histogram comes out. Or is there a better method pragmatically to determine the minimum count for each name to enter the histogram?

Thanks!

Edit: Here is what the table would look like in a RDBS:

NAME | COUNT(NAME)

John | 10
Bill | 24
Jane | 12
Tony | 50
Emanuel| 1
...

What I want to be able to do is create a function to graph a histogram, where I can change a value that sets the minimum frequency to be graphed. Make more sense?

Ivo
  • 3,890
  • 5
  • 22
  • 53
Tony D
  • 184
  • 2
  • 12
  • What have you tried already, what didn't work, and where exactly would you like help? Oh, and do you have some data we could use? See http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for some help. – Andy Clifton Jan 29 '14 at 04:30
  • So I just found the `table()` function, which when you pass it an array of names will automatically give you the counts. I should be good there. For the histogram I am using `plot(df.counts$NAME[which(df.counts > 10)]);` but this is not working. – Tony D Jan 29 '14 at 04:51
  • From what you've supplied, there's no way to know what your `df.counts` looks like, nor what the element `NAME` looks like. Assuming your initial data.frame is `d`, and names are in element `name`, try `plot(as.table(table(d$name)[table(d$name) > 10]))` – jbaums Jan 29 '14 at 05:24

1 Answers1

1
> x <- read.table(textConnection('
+    Name Age Gender Presents Behaviour
+ 1    John   9   male       25   naughty
+ 2     Bill   5   male       20      nice
+ 3     Jane  4 female       30      nice
+ 4     Jane  4 female       20      naughty
+ 5     Tony   4   male       34   naughty'
+ ), header=TRUE)
> 
> table(x$Name)

Bill Jane John Tony 
   1    2    1    1   
> layout(matrix(1:4, ncol = 2))
> plot(table(x$Name), main = "plot method for class \"table\"")
> barplot(table(x$Name), main = "barplot")
> tab <- as.numeric(table(x$Name))
> names(tab) <- names(table(x$Name))
> dotchart(tab, main = "dotchart or dotplot")
> ## or just this
> ## dotchart(table(dat))
> ## and ignore the warning
> layout(1)  

enter image description here

Prasanna Nandakumar
  • 4,295
  • 34
  • 63
  • 1
    This is great prasanna, but I'm looking for a histogram (frequency) chart that I can customize the minimum threshold. For example, if the dataframe had 1000 records/observations, all of different names, I want a chart to show me how many times each name came up, with the ability to set a minimum number of times (that way I don't have to plot names that only occur a few times each). Make sense? – Tony D Jan 29 '14 at 16:07