4

I am trying to determine what is necessary to write a line profiler for a language, like those available for Python and Matlab.

A naive way to interpret "line profiler" is to assume that one can insert time logging around every line, but the definition of a line is dependent on how a parser handles whitespace, which is only the first problem. It seems that one needs to use the parse tree and insert timings around individual nodes.

Is this conclusion correct? Does a line profiler require the parse tree, and is that all that is needed (beyond time logging)?


Update 1: Offering a bounty on this because the question is still unresolved.

Update 2: Here is a link for a well known Python line profiler in case it is helpful for answering this question. I've not yet been able to make heads or tails of it's behavior relative to parsing. I'm afraid that the code for the Matlab profiler is not accessible.

Also note that one could say that manually decorating the input code would eliminate a need for a parse tree, but that's not an automatic profiler.

Update 3: Although this question is language agnostic, this arose because I am thinking of creating such a tool for R (unless it exists and I haven't found it).

Update 4: Regarding use of a line profiler versus a call stack profiler - this post relating to using a call stack profiler (Rprof() in this case) exemplifies why it can be painful to work with the call stack rather than directly analyze things via a line profiler.

Community
  • 1
  • 1
Iterator
  • 20,250
  • 12
  • 75
  • 111
  • Are you aware of `Rprof()` in R as well as 'R CMD Rprof' for post-processing, along with at least two CRAN packages to summarize and visualize profiling data? And if you do, can you detail how what you look for is different? – Dirk Eddelbuettel Sep 27 '11 at 22:05
  • Hi again @Iterator. I know the machinery for stack-sampling is in matlab, because it can be interrupted with Ctrl-C, and you can display the stack with [dbstack](http://www.mathworks.com/help/techdoc/ref/dbstack.html). If you need to build a profiler you might automate the [manual method](http://stackoverflow.com/questions/4295799/how-to-improve-performance-of-this-code/4299378#4299378). No need to parse. – Mike Dunlavey Sep 27 '11 at 22:08
  • @Dirk: Yep, aware of `Rprof()` and have used it and the other packages extensively - [see this Q](http://stackoverflow.com/questions/7252602/digging-into-r-profiling-information). In short, `Rprof` is a call stack profiler, not a line profiler. A stack profiler is often useful and adequate, but a line profiler is, in my experience, often more helpful when working with languages such as R, Matlab, and Python. Working with `Rprof()`, I wrote a lot of code to analyze its output before giving up and manually decorating the code; the call stack can be stimulating, but painful. – Iterator Sep 27 '11 at 22:30
  • @Iterator: Thanks for the update. With that I'd say that this is the wrong forum. Ask on r-devel, or email Prof Tierney directly. – Dirk Eddelbuettel Sep 27 '11 at 22:35
  • @Dirk: I don't have my notes handy, but I was thinking of using one of the packages by Luke Tierney to parse the code, then decorate discrete code blocks, as I'm not certain that simply decorating every line will not break code. – Iterator Sep 27 '11 at 22:35
  • @DirkEddelbuettel: We were commenting simultaneously - I'll likely contact Prof. Tierney before working on it. Still, I'm interested in knowing about this beyond just R. It seems like the answer should be affirmative - a parser is necessary, but I haven't ruled out that it could be done without a parser. – Iterator Sep 27 '11 at 22:38
  • Have you looked at [profr](http://cran.r-project.org/web/packages/profr/index.html)? It at least does a bit better job at visualising the results of RProf. The next version will have [interactivity](http://vimeo.com/24475118) – hadley Sep 28 '11 at 00:22
  • @hadley: Yep, tried that as well [see this Q and comments](http://stackoverflow.com/questions/7252602/digging-into-r-profiling-information/7254472#7254472). I'm certainly looking forward to interactive versions. Unfortunately, as the raw data is a call stack profile, it isn't possible to do line-based analyses: the map from lines to calls is surjective, not bijective [as mentioned in this post](http://stackoverflow.com/questions/7252602/digging-into-r-profiling-information/7297445#7297445). I wrote a lot of code and tried a lot of tools until I realized this simple fact. – Iterator Sep 28 '11 at 00:38

2 Answers2

3

It sounds like what you mean by line profiler is something that measures time spent (i.e. instrumenting) within each line. I hope what you mean by time is wall-clock time, because in real good-size software if you only look at CPU time you're going to be missing a lot.

Another way to do it is stack-sampling on wall-clock time, as in the Zoom and LTProf profilers. Since every line of a stack sample can be localized to a line of code using only a map or pdb file, in the same way as debuggers do, there is no need to parse or modify the source.

The percent of time taken by a line of code is simply the percent of stack samples containing it. Since you are working at the line level, there is no need to distinguish between exclusive (self) time and inclusive time. This is because the line's percent of time active is what matters, whether or not it is a call to another function, a call to a blind system function, or just a call to microcode.

The advantage of looking at percents, instead of absolute times, is you don't need to worry about the app being slowed down, either by the sampling itself, or by competition with other processes, because those things don't affect the percents very much.

Also you don't have to worry about recursion. If a line of code is in a recursive function and appears more than once on a sample, that's OK. It still counts as only one sample containing the line. The reason that's OK is, if that line of code could somehow be made to take no time (such as by removing it) that sample would not have occurred. Therefore the samples containing that line would be removed from the sample set, and the program's total time would decrease by the same amount as the fraction of samples removed. That's irrespective of recursion.

You also don't need to count how many times a line of code is executed, because the number that matters for locating code you should optimize is the percent of time it's active.

Here's more explanation of these issues.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • Hey Mike! I'm glad you answered - I was going to be disappointed if you didn't find this - I've read a lot of your suggestions. ;-) You're absolutely right about timings & the information gained. Where I find the line (versus stack) profiling informative is that the actionable unit is the line or code block. LTProf is very puzzling: how does it get this information without instrumentation? I didn't think one could take the call stack and guarantee a reverse mapping back to code (i.e. the map from code to call stack is surjective, but I thought it's not bijective). – Iterator Sep 05 '11 at 21:11
  • Zoom's also very interesting; unfortunately, I think the language of interest doesn't have DWARF debug support. So, I think it all begins with a parse tree... You're also correct about wall & CPU time. I intend to measure both. – Iterator Sep 05 '11 at 21:20
  • @Iterator: I assume LTProf works just like Zoom. I've always done it by hand (as you probably know) but I also wrote a stack-sampler once. Fundamentally a stack sample is just a list of addresses - the PC, plus each return address up the chain. Each address can be looked up in the map file to get its file and line number (and function). The simplest implementation is just to save the samples, and then, for viewing, pick one of the addresses, see how many samples it's in, and display that as a percent along with the line of the file. Then you could have a butterfly view, etc. Why parse? – Mike Dunlavey Sep 05 '11 at 21:43
  • The language I'm looking at isn't usually compiled - it's interpreted. The current option is simply a statistical sampler (much like you've advocated) with the call stack. However, the log gets very messy when one line from the code invokes a lot of activity in external libraries. – Iterator Sep 05 '11 at 21:47
  • @Iterator: OK, if the language is interpreted, you probably don't want the CPU call stack. You probably want the interpreter's call stack, because that is what will relate to the user's code. Lots of interpreters allow a random-time Ctrl-C or some signal, then capture the stack and continue. [Here's a simple example in python.](http://stackoverflow.com/questions/4295799/how-to-improve-performance-of-this-code/4299378#4299378) – Mike Dunlavey Sep 05 '11 at 21:55
  • +1 Thanks for the suggestions, though I still want to focus on a line profiler; an adequate call stack profiler exists, but matching back to lines is excruciating & not automatic. Besides, I'm still curious how one can profile at the line level. The DWARF functionality of Zoom is appealing, but I guess it won't generalize. LTProf seems limited to Windows, unfortunately. – Iterator Sep 06 '11 at 02:01
  • @Iterator: The interpreter has a stack, and the code on the stack is the user code that's active, down to as fine a granularity as possible. Then it's a matter of how that information can be presented. Best of luck. – Mike Dunlavey Sep 06 '11 at 11:46
3

I'd say that yes, you require a parse tree (and the source) - how else would you know what constitutes a "line" and a valid statement?

A practical simplification though might be an "statement profiler" instead of a "line profiler". In R, the parse tree is readily available: body(theFunction), so it should be fairly easy to insert measuring code around each statement. With some more work you can insert it around a group of statements that belong to the same line.

In R, the body of a function loaded from a file typically also has an attribute srcref that lists the source for each "line" (actually each statement) :

Here's a sample function (put in "example.R"):

f <- function(x, y=3)
{
    a <- 0; a <- 1  # Two statements on one line
    a <- (x + 1) *  # One statement on two lines
        (y + 2)

    a <- "foo       
        bar"        # One string on two lines
}

Then in R:

source("example.R")
dput(attr(body(theFunction), "srcref"))

Which prints this line/column information:

list(structure(c(2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L), srcfile = <environment>, class = "srcref"), 
    structure(c(3L, 2L, 3L, 7L, 9L, 14L, 3L, 3L), srcfile = <environment>, class = "srcref"), 
    structure(c(3L, 10L, 3L, 15L, 17L, 22L, 3L, 3L), srcfile = <environment>, class = "srcref"), 
    structure(c(4L, 2L, 5L, 15L, 9L, 15L, 4L, 5L), srcfile = <environment>, class = "srcref"), 
    structure(c(7L, 2L, 8L, 6L, 9L, 20L, 7L, 8L), srcfile = <environment>, class = "srcref"))

As you can "see" (the last two numbers in each structure are begin/end line), the expressions a <- 0 and a <- 1 map to the same line...

Good luck!

Tommy
  • 39,997
  • 12
  • 90
  • 85
  • I am very inclined to agree, as you're right: knowing the boundary of a statement is not the same as knowing the boundary of lines. For instance `if(condition){` and `if(condition)\n{` are often the same, but inserting logging code between `)` and `{` would often affect the behavior. Knowing that something affects flow control seems imperative, hence the "necessary" part of the question. As for being sufficient - I believe it's the case - what could be more sufficient than a parser? :) – Iterator Sep 28 '11 at 19:48