11

I am using RStudio, R Markdown, Latex, and Pandoc to clean data, construct variables, run my analysis, and report the results. I'm new to the concept of reproducible research, but I'm hooked. Makes a lot of sense.

Dynamic tables and figures are no problem. Dynamic text, however, is stumping me. I can insert inline code to say that 95% of all statistics are false, but I am not sure how I can vary my language in a reproducible way.

For instance, what if I have an object x=0.66 and I want to write "2 out of 3 dentists use Crest"? I can look at the current value of x, 0.66, and type "2 out of 3" in the text, but this is not reproducible. Let's say I get new data and rerun my analysis and x becomes 0.52. My text would be out of date. Sure, I could dynamically report that 52% of dentists prefer Crest, but a report gets stale when everything is reported as percentages.

My thought is that I could create functions that I could call in the text when I want to vary the writing. For instance, an "out.of" function could work on if else statements to produce the text:

ifelse(x < 0.09,"fewer than 1 out of 10",
ifelse(x >= 0.09) & x < 0.11,"roughly 1 out of 10",
ifelse(x >= 0.11 & x < 0.15,"slightly more than 1 out of 10",
ifelse(x >= 0.15 & x < 0.19,"nearly 2 out of 5",
ifelse(x >= 0.19 & x < 0.21,"roughly 2 out of 5",
...
ifelse(x >= 0.95 & x < 0.99,"nearly all",
ifelse(x >= 0.99,"all","fubar"))...)

I could also create a fraction function that would do something similar for one-tenth, two-fifths, one-third...

I'm sure others have tackled this issue already. Any leads? Ideas?

Eric Green
  • 7,385
  • 11
  • 56
  • 102
  • This is a really interesting question, but I think it would really depend on what your limits are for readability. Do you, for instance, consider "1 out of 20" or "1 out of 25" to be valid options? What threshold do you want to set for the more general breaks (like "2 out of 5")? Once this is sorted out, I would suggest trying `cut()` and specifying labels instead of `ifelse()`. I don't think any of the packages will do that for you, but they should help you in getting there! – A5C1D2H2I1M1N2O1R2T1 Dec 30 '12 at 16:38
  • When it comes to representing percentages, I think "out of 10" is the lowest I would want to go. I've learned a ton of R this year (from a baseline of zero), but not cut(). Will look into it. Thanks. – Eric Green Dec 30 '12 at 17:40
  • So then you would have to figure out what "out of" categories are most useful. Categories like "out of {9, 8, 7, 6}" might not be very user friendly since that will tax some readers who mentally try to convert those numbers back to a percentage. Good luck! – A5C1D2H2I1M1N2O1R2T1 Dec 30 '12 at 17:47

1 Answers1

7

There is a package FRACTION and when you replace / by "out of", it could work. However, the output when using the number of decimals is strange:

library(FRACTION)
fra(0.66,j=2)
# [1] "33 / 50"
fra(0.66,j=1)
#"7 / 1e+08" 

Edit by @Dieter Menne: forget this, see @Ben Bolker below.

Dieter Menne
  • 10,076
  • 44
  • 67
  • 5
    you might be able to get around this with `MASS::fractions`: `fractions(0.66,cycles=3)` (and `MASS` is already Recommended) – Ben Bolker Dec 29 '12 at 20:05