0

I have a single series of values (i.e. one column of data), and I would like to create a plot with the range of data values on the x-axis and the frequency that each value appears in the data set on the y-axis.

What I would like is very close to a Kernel Density Plot:

# Kernel Density Plot
d <- density(mtcars$mpg) # returns the density data 
plot(d) # plots the results

and Frequency distribution in R on stackoverflow.

However, I would like frequency (as opposed to density) on the y-axis.

Specifically, I'm working with network degree distributions, and would like a double-log scale with open, circular points, i.e. this image.

I've done research into related resources and questions, but haven't found what I wanted:

Cookbook for R's Plotting distributions is close to what I want, but not precisely. I'd like to replace the y-axis in its density curve example with "count" as it is defined in the histogram examples.

The ecdf() function in R (i.e. this question) may be what I want, but I'd like the observed frequency, and not a normalized value between 0 and 1, on the y-axis.

This question is related to frequency distributions, but I'd like points, not bars.

EDIT:

The data is a standard power-law distribution, i.e.

dat <- c(rep(1, 1000), rep(10, 100), rep(100, 10), 100)
Community
  • 1
  • 1
Scott Emmons
  • 1,831
  • 3
  • 13
  • 9
  • Could you provide sample data (real or simulated) that represents your input data? See [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for tips on how to do this. Do you want to collapse the x-axis into bins? Do you consider this data categorical or continuous? – MrFlick Jun 25 '14 at 03:07
  • See edit. I'd be glad to provide sample data, but am worried it would flood the question space and am not sure how to simulate it in R. – Scott Emmons Jun 25 '14 at 03:22
  • `dat <- c(rep(1, 1000), rep(10, 100), rep(100, 10), 100)`. No flood. – IRTFM Jun 25 '14 at 03:43

2 Answers2

5

The integral of a density is approximately 1 so multiplying the density$y estimate by the number of values should give you something on the scale of a frequency. If you want a "true" frequency then you should use a histogram:

d <- density(mtcars$mpg) 
d$y <- d$y * length(mtcars$mpg)  ; plot(d)

This is a histogram with breaks that are 1 unit each:

hist(mtcars$mpg, 
     breaks=trunc(min(mtcars$mpg)):(1+trunc(max(mtcars$mpg))), add=TRUE)

So this is the superposed comparison:

d <- density(mtcars$mpg) 
d$y <- d$y * length(mtcars$mpg)  ; plot(d, ylim=c(0,4) )
hist(mtcars$mpg, breaks=trunc(min(mtcars$mpg)):(1+trunc(max(mtcars$mpg))), add=TRUE)

enter image description here

You'll want to look at the density page where the default density bandwidth choice is criticized and alternatives offered. f you use the adjust parameter you might see a closer (smoothed correspondence to the histogram

enter image description here

IRTFM
  • 258,963
  • 21
  • 364
  • 487
2

If you have discrete values for observations and want to make a plot with points on the log scale, then

dat <- c(rep(1, 1000), rep(10, 100), rep(100, 10), 100)

dd<-aggregate(rep.int(1, length(dat))~dat, FUN=sum)
names(dd)<-c("val","freq")

plot(freq~val, dd, log="xy")

might be what you are after.

enter image description here

MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • That's what I was looking for. Creating a plot of that sort to examine the distribution of edge degrees in a network is very standard, but I didn't see it addressed anywhere else on SO. OP could be edited to make that problem and the solution clear for others. – Scott Emmons Jun 25 '14 at 13:56