0

It is well known to everyone that R handles large data very easily. What I have trouble with is putting results of analysis performed in R in tables for publication.

I would like to explain that in an example. We have this simple dataset:

value<-cbind(c(rnorm(100,500,90),rnorm(100,800,120)))
genotype<-cbind(c(rep("A",100),rep("B",100)))
gender<-rep(c("M","F","F","F"),50)
df<-cbind(value,genotype,gender)
df<-as.data.frame(df)
colnames(df)<-c("value","genotype","gender")
df$value<-as.numeric(as.character(df$value))

I would like to analyze the data for a scientific project. To extract the information I need, I have to do this:

> quantile(subset(df,gender=="M")$value)
       0%       25%       50%       75%      100% 
 323.6955  523.1237  655.6593  828.7438 1045.0406 
> quantile(subset(df,gender=="F")$value)
       0%       25%       50%       75%      100% 
 233.3721  520.1101  633.8767  802.2277 1149.3072 
> wilcox.test((subset(df,gender=="M")$value),(subset(df,gender=="F")$value))$p.value
[1] 0.924699
> table(df$genotype)

  A   B 
100 100 
> table(df$gender)

  F   M 
150  50 
> prop.test(50,150)$p.value
[1] 6.311983e-05
> table(df$genotype,df$gender)

     F  M
  A 75 25
  B 75 25
> prop.table(table(df$genotype,df$gender),2)

     F   M
  A 0.5 0.5
  B 0.5 0.5
> prop.test(c(75,25),c(125,50))$p.value
[1] 0.2990147

Well, this gives me all the information I need, but there is a long way from this to creating a publication quality table. For this, I have to copy/paste numbers from the results into the Excel. The final product is this:

enter image description here

The problem is that copy/paste is incovenient, can become tedious with large amount of data, and creates the possibility of human error. Is there a way to "program" or "encode" this table directly in R, so that I cam just run the code and save the output as a .csv file?

Oposum
  • 1,155
  • 3
  • 22
  • 38
  • 2
    Some info: http://stackoverflow.com/questions/9660359/general-guide-for-creating-publication-quality-tables-using-r-sweave-and-latex –  Dec 21 '15 at 01:34

1 Answers1

5

You could use the Publish package (not quite yet on CRAN but can be gotten from GitHub).

library(devtools)
install_github("tagteam/Publish")
library(Publish)

Then you can use the univariateTable function to get exactly what you are asking for (Q requests median and IQR)

univariateTable(gender ~ Q(value) + genotype, data=df)
  Variable        Level    gender = M (n=50)   gender = F (n=150)
1    value median [iqr] 647.0 [488.4, 829.0] 615.4 [493.5, 797.4]
2 genotype            A            25 (50.0)            75 (50.0)
3                     B            25 (50.0)            75 (50.0)
         Total (n=200) p-value
1 617.9 [491.0, 812.4]   0.666
2           100 (50.0)        
3           100 (50.0)   1.000

The function returns a data frame which can easily be saved to a text file using, say, write.table or something similar.

ekstroem
  • 5,957
  • 3
  • 22
  • 48
  • thanks for your excellent suggestion, I will definitely look into it. How do I install it from GitHub? I get error messages for both commands above. – Oposum Dec 21 '15 at 02:28
  • @Oposum You need to install package `devtools` first –  Dec 21 '15 at 02:41
  • I am experiencing a massive problem with dependencies, do you know the link to the developer of the package so that maybe I can get some help there? – Oposum Dec 21 '15 at 03:04
  • @Oposum Dependencies of what? –  Dec 21 '15 at 07:02
  • You can use [the GitHub page](https://github.com/tagteam/Publish) to get in contact with the developer or list issues. And just use the commands I listed to install directly from the GitHub page – ekstroem Dec 21 '15 at 07:44
  • @Pascal trying to install `devtools` results in error messages: `configuration failed for package ‘curl’ dependency ‘BH’ is not available for package ‘xml2’ dependency ‘curl’ is not available for package ‘httr’ dependencies ‘curl’, ‘xml2’ are not available for package ‘rversions’ dependencies ‘httr’, ‘curl’, ‘rversions’ are not available for package ‘devtools’` – Oposum Dec 22 '15 at 02:25
  • You need to use the `dependencies = TRUE` argument in `install.packages`. –  Dec 22 '15 at 02:57
  • @ekstroem The dependencies still won't install. I get following message that I shortened for this post but it goes on: `1: In install.packages("devtools", dependencies = TRUE) : installation of package ‘BH’ had non-zero exit status 2: In install.packages("devtools", dependencies = TRUE) : installation of package ‘curl’ had non-zero exit status 3: In install.packages("devtools", dependencies = TRUE) : installation of package ‘xml2’ had non-zero exit status` – Oposum Dec 26 '15 at 18:53
  • What operating system are you running on? – ekstroem Dec 26 '15 at 19:17
  • @ekstroem I finally got it working at it works great. But there are two more things that I would want to do with `univariateTable`: 1. Can it also calculate percentages out of categorical variables, or does it only work with continuous variables? 2. Is there a way to have it distinguish which variables are normally distributed and which not (or is there any other automated way to do it), so that standard deviation vs. interquartile range gets applied automatically? – Oposum Jan 05 '16 at 21:04
  • You already get the percentages for factors as shown in the example output above. As for getting the function to determine the best way to present data: no that cannot be done within the `univariateTable` function. You'd need to do the test externally and then specify the proper expression based on those results. – ekstroem Jan 05 '16 at 23:24