3

I'm sure this is easy, but I've been tearing my hair out trying to find out how to do this in R.

I have some data that I am trying to fit to a power law distribution. To do this, you need to plot the data on a log-log cumulative probability chart. The y-axis is the LOG of the frequency of the data (or log-probability, if you like), and the x-axis is the log of the values. If it's a straight line, then it fits a power law distribution, and the gradient determines the power law parameter.

If I want the frequency of the data, I can just use the ecdf() function:

My data set is called Profits.negative, and it's just a long list of trading profits that were less than zero (and I've notionally converted them all to positive numbers to avoid logging problems later on).

So I can type

plot(ecdf(Profits.negative))

And I get a handy empirical CDF function plotted. All I need to do is to convert both axes to log scales. I can do the x-axis:

Profits.negative.logs <- log(Profits.negative)
plot(ecdf(Profits.negative.logs))

Almost there! I just need to work out how to log the y-axis! But I can't seem to do it, and I can't work out how to extract the figures from the ecdf object. Can anyone help?

I know there is a power.law.fit function, but that just estimates the parameters - I want to plot the data and see if it lines up.

user2047916
  • 31
  • 1
  • 2
  • 4
    You'll likely get more help by including the data you're working with. This post is helpful in that regard: http://stackoverflow.com/a/5963610/495372 – Andy Barbour Feb 06 '13 at 18:34

3 Answers3

6

You can fit and plot power-laws using the poweRlaw package. Here's an example. First we generate some data from a heavy tailed distribution:

set.seed(1)
x = round(rlnorm(100, 3, 2)+1)

Next we load the package and create a data object and a displ object:

library(poweRlaw)
m = displ$new(x)

We can estimate xmin and the scaling parameter:

est = estimate_xmin(m))

and set the parameters

m$setXmin(est[[2]])
m$setPars(est[[3]])

Then plot the data and add the fitted line:

plot(m)
lines(m, col=2)

To get:

enter image description here

csgillespie
  • 59,189
  • 14
  • 150
  • 185
  • this is really useful. I'm trying to manually replicate your plot something similar to rafa's solution below. How could I view the source code for `m` above (what exactly goes in for x and y in `plot(m)`), I would like to see the equivalent of rafa's `k <- seq_along(x)` in your code if possible? `getAnywhere(displ)` doesnt show me. I'm trying to figure out why he (you too?) use `k` rather than log(x) on the x axis? Thanks – user63230 Oct 23 '19 at 11:38
  • 1
    If you do `dd = plot(m)` you get back the x and y. Does that help? – csgillespie Oct 23 '19 at 21:57
2

Data generation first (you part, actually ;)):

set.seed(1)
Profits.negative <- runif(1e3, 50, 100) + rnorm(1e2, 5, 5)

Logging and ecdf:

Profits.negative.logs <- log(Profits.negative)
fn <- ecdf(Profits.negative.logs)

ecdf returns function, and if you want to extract something from it - it's good idea to look into function's closure:

ls(environment(fn))
# [1] "f"      "method" "n"      "nobs"   "x"      "y"      "yleft"  "yright"

Well, now we can access x and y:

x <- environment(fn)$x
y <- environment(fn)$y

Probably it's what you need. Indeed, plot(fn) and plot(x,y,type="l") show virtually the same results. To log y-axis you need just:

plot(x,log(y),type="l")
redmode
  • 4,821
  • 1
  • 25
  • 30
1

Here is an approach using ggplot2:

library(ggplot2)

# data
  set.seed(1)
  x = round(rlnorm(100, 3, 2)+1)

# organize data into a df
  df <- data.frame(x = sort(x, decreasing = T),
                   pk <- ecdf(x)(x),
                   k <- seq_along(x))

# plot
  ggplot(df, aes(x=k, y= pk)) + geom_point(alpha=0.5) + 
    coord_trans(x = 'log10', y = 'log10') +
    scale_x_continuous(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x))) +
    scale_y_continuous(breaks = trans_breaks("log10", function(x) 10^x), labels = trans_format("log10", math_format(10^.x)))

enter image description here

rafa.pereira
  • 13,251
  • 6
  • 71
  • 109
  • why do you plot log(k) `seq_along(x)` on the x axis and not log(x) if you are trying to plot a log log plot? See my comment above under Colins answer. Thanks – user63230 Oct 23 '19 at 11:42