0

I'm writing an annual report for uni, in which I would like to detail how my usage of R has increased over the past year. I'm looking metrics that I can use to describe my usage of R. Some possible metrics to describe usage:

  • number of lines of code in history
  • number of errors
  • hours spent using program
  • number of times a particular function has been called
  • number of plots made

So my question is: can I extract any of the above from R, or can I extract any other metrics which would demonstrate my usage of R?

luciano
  • 13,158
  • 36
  • 90
  • 130
  • I think hours spent using the program are out the window. If I did that for my R time, it would be 24 hours a day - I have had R running for the past few years -- constantly. I think your question is a good starting point for people to think how much a certain task takes time. People in engineering have everything mapped out - a cubic meter of cement takes this much time (and materials) to make. I realize in R this may depend on the experience, coffee input, sleep hours, etc... But still, ball park figures should be enough to satisfy the bureaucrats. – Roman Luštrik Apr 17 '13 at 16:17
  • 2
    In a word, No. Not without some forward planning anyway. AFAIK unless you explicitly state otherwise you will open a fresh session when you next open R, and it will begin recording your command history. Maybe if you have never shutdown your computer or exited R you can work it out? – Simon O'Hanlon Apr 17 '13 at 16:21
  • 1
    One way to start planning, is to look into the use of an RProfile file and the `.Last` function, which could be a user defined function run *everytime* you shutdown R. – Simon O'Hanlon Apr 17 '13 at 16:24
  • 2
    Actually...stack overflow reputation points might be as good as anything! – luciano Apr 17 '13 at 16:26
  • 4
    If you have kept your code in a source code control system such as svn, git, hg, etc. then you could total the number of lines of code in each commmit in the last year, the number of commits, the average commit size and you could determine how many lines of code you had a year ago and now. – G. Grothendieck Apr 17 '13 at 16:30
  • 1
    I'd mention the percentage of your programming work you do in R, e.g. now do 80% of my analyses in R in stead of Excel. – Paul Hiemstra Apr 17 '13 at 16:35

2 Answers2

3

First, I'm not sure that this question is at all suited to Stack Overflow. Second, I think that the metrics you've identified are not really suitable. Let's look at the ones you've shortlisted so far:

  • Number of lines of code in history

    You make a lot of tweaks to your code. They accumulate in your history. Your history now has a lot of lines of code. Does that reflect positively of your usage of R? Or, you like to write code like the following in R:

    temp <- 0
    for (i in 1:10) {
      temp <- temp + i
    }
    print(temp)
    

    while a person familiar with R would just write sum(1:10). One line versus five. Can we really say that number of lines is a good metric?

  • Number of errors

    Maybe there is some merit to this. But are you going to classify errors in some way? Is a missing or misplaced bracket forgivable? What about times when no error or warning is issued but R behaves in a way that you might not have expected, thus leading to unexpected results (for example, assuming that numeric(0) and factor(0) would behave the same way). See here for some R gotchas, several of which won't provide any indication of an error, but would certainly lead to erroneous analysis. How would they be analyzed with this metric?

  • Number of hours spent using the program

    Again, debatable. How do you measure the number of hours? Time spent coding? Time the computer spends processing your code? Time it took you to figure out how to program your problem?

  • Number of times a particular function has been called

    I don't understand this metric at all. Do more obscure functions get a higher weight (for example, if you are one of those who use vapply while the rest of the schmucks use sapply, do you get bonus points for using vapply because it can be safer (and sometimes faster) to use?)

  • Number of plots made

    Sorry, but again, I don't understand this metric at all. First of all, not all plots are created equally! There are several in the data visualization field who feel that a lot of software ruined data visualization because some software (a very popular spreadsheet program, in particular) made it so easy for people to quickly make gaudy plots. With R, they are less gaudy by default, but that in itself doesn't make it good. So, if you're just measuring the number of plots churned out without some other criteria for quality assessment, then I'm not sure how this metric is useful.

And, from your comment to your question:

  • Actually...stack overflow reputation points might be as good as anything!

    Eh... The only time I really use R is to answer questions on Stack Overflow (unfortunately true). At the same time, almost all my reputation points here are from the questions I've answered in the R tag. Sure, there are some users here that I would really trust, but sometimes, I don't even trust myself, so I don't know if that's a good indicator of your usage of R.

    Lots of users have also complained that Stack Overflow voting is totally wacky, so I'm not sure that you really can use "reputation" as a valid measure of skill. For example, there's an ongoing discussion among regular users here that answers to "easy" questions get voted up very quickly (because they are easy to verify, often without even running the code) while answers to "complicated" questions don't yield votes proportional to the effort taken to answer the question. Case in point: Why the heck do I have a "Guru" badge for an answer that is essentially a reordered version of data already easily available with two minutes on Google. I'm not particularly proud of that answer, and it certainly doesn't say anything about my "usage" of R.


Now, to make this so that it might qualify as an answer and not just an extended comment on your question itself, the biggest thing that I would consider valid, but not sure how to measure it, would be something like how active you are in the R community. There are many ways to get involved with R, from writing or contributing to packages, filing bug reports, conducting workshops to help others make the switch to R, and so on.

I'm not suggesting that you need to write a book, as several others here have done, or to become a legendary package developer with a cult of underscore followers, but you can take small steps. For instance, although I'm a writing teacher, I have held workshops for students and written a few "getting started tips" just to introduce them to using R, so they can consider adding it to their toolkit. Many other users here regularly blog about their experiences working with R and, again, as this is part of a community, they learn a lot in the process.

Finally, a couple of more ideas:

  • @PaulHiemstra suggested in his comment that you could "mention the percentage of your programming work you do in R." I would extend that concept as follows: (1) try to measure how much of your work overall is done in R and tools complementary to R (obvious ones like Sweave/knitr/LaTeX come to mind), and (2) try to measure how much of an impact using R has had on improving your overall skills (with the logic being that good programming is often accompanied by logical thought, careful organization, good documentation, and so on).

  • Related to the previous point, try to see how your usage of R has changed with time. Has your behavior changed from manually redoing the same steps to writing functions yet? Have you then gone back and adapted those functions so that, instead of solving a specific problem you had at a given point in time, they can be used more generally by a larger audience? These are pretty significant changes, particularly if you had started from scratch with the language, and they can be a bit more meaningful than the ideas you presented in your question.

So, to summarize, a lot of the somewhat easily quantifiable things that you've identified in your question will probably lead to very meaningless analysis. I feel that the qualitative inputs you make would be much more valuable.

Community
  • 1
  • 1
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
2

Another metric: Get an old and complex (don't know if you have one) code and redo it from 0. Use the difference of computation time as metric.

Rcoster
  • 3,170
  • 2
  • 16
  • 35