2

In response to the helpful comments, I have edited the original question (where I had assumed that a for-loop and an apply-loop give different results).

I am using R to run a large number of 2-group t-tests, using input from a delimited table. Following recommendations from here and elsewhere, I tried either 'for-loops' and 'apply' to accomplish that. For 'normal' t.test, both work nicely and give the same results. However, for a paired t-test, the for-look appears to works while the apply-loop does not. Later, i found out that both loops suffer from the same problem (see below) but the for-loops deals more gracefully with the situation (only one cycle of the loop returns an invalid result) while the apply-loop fails altogether.

My input file looks like this: (the first line is a header line, the data lines have a name, 4 datapoints for group 1 and 4 datapoints for group 2):

header g1.1 g1.2 g1.3 g1.4 g2.1 g2.2 g2.3 g2.4
name1  0    0.5  -0.2 -0.2 -0.1 0.4 -0.3 -0.3
name2  23.2 24.4 24.5 27.2 15.5 16.5 17.7 20.0
name3  .....

and so on (overall ~50000 lines). The first data line (starting with name19 turned out to be the culprit.

This is the for-loop version that works better (failes on the problematic line but correctly deals with all other lines):

table <- read.table('ttest_in.txt',head=1,sep='\t')
for(i in 1:nrow(table)) {
   g1<-as.numeric((table)[i,2:5])
   g2<-as.numeric((table)[i,6:9])
   pv <- t.test(g1,g2,paired=TRUE)$p.value
}

This is the 'apply' version that causes problems

table <- read.table('ttest_in.txt',head=1,sep='\t')
pv.list <- apply(table[,2:9],1,function(x){t.test(x[1:4],x[5:8],paired=TRUE)$p.value})

One of the ~50000 data lines is problematic in that the differences of all pairwise comparions are identical, which in a paired t-test results in an undefined p-value (essentially zero). The apply loop crashes with the error 'data are essentially constant'. To me (as an R newbie) it does not seem to be a good idea to crash the entire script just because the t.test doesn't like one piece of data. In the for-loop, this data line also results in an error message but the loop continues and all the other t-tests give correct results.

Did I do something fundamentally wrong? This behaviour a essentially prohibits the usage of apply-loops for this kind of batch analysis. Or is there a standard way to circumvent this problem. Why doesn't the t-test just return something invalid for that particular p-value instead of bailing out?

C8H10N4O2
  • 18,312
  • 8
  • 98
  • 134
user900889
  • 75
  • 3
  • 8
  • 2
    Can you provide a minimal example reproducing the error? We probably need more help than you give us above. However, don't do the `as.matrix()` line in the `apply()` version. If `header` is really in the file then you may be getting into trouble there. `apply()` works on a data frame just fine, so you can use `pv.list <- apply(table[, 2:9], 1 ,function(x) { t.test(x[1:4], x[5:8], paired=TRUE)$p.value})` rather than work on the matrix `mat`. Does that fix things? – Gavin Simpson Feb 28 '12 at 12:01
  • To add to Gavin's excellent questions: try running your `apply` code with `debug` turned on. Some semi-obvious indexing error may show up. – Carl Witthoft Feb 28 '12 at 13:02
  • Thanks for your comments. Gavin's suggestion to skip the as.matrix step simplified the script but did not change anything. I am not familiar with 'debug', I might try this later. Next, I tracked down the culprit line and now i (partially) undestand the problem. I have changed the example file above, it is the first line that causes the problem. I can see why a paired t-test doesn't like these data, but is this reason enough to crash the entire 'apply' loop? Is there anything I can do about this? – user900889 Feb 28 '12 at 17:02
  • I should add that after understanding the problem, I see that the for-loop also rejected this particular data line. I failed to notice this because the rest of the loop worked fine and the output file looked unconspicous (with just one output line missing) – user900889 Feb 28 '12 at 17:14
  • Good job doing the back-review. Can you either edit your question or delete it to reflect your updated problem? – Carl Witthoft Feb 28 '12 at 20:17
  • I have heavily edited the original question. Is this the policy here? On the downside, it makes some of the original comments/answers look weird – user900889 Feb 28 '12 at 21:25

4 Answers4

5

UPDATE Since you said that the for-loop ALSO give the error and you want the apply version to be more robust, why not simply add a tryCatch?

pv.list <- apply(table[,2:9],1, function(x) tryCatch( 
  t.test(x[1:4],x[5:8],paired=TRUE)$p.value, error=function(x) NA ))

This should return NA if the p.value could not be calculated. You could change to another value (e.g. NULL, 0 or Inf) by editing the error handling function.

Old Post

I noted that t.test (kind of) gives the error you found when some values are Inf (which seems to be a bug):

> t.test(1:10, c(rep(1,9), Inf), paired=TRUE)
Error in if (stderr < 10 * .Machine$double.eps * abs(mx)) stop("data are essentially constant") : 
missing value where TRUE/FALSE needed

So do you actually get this or does it actually say:

Error in t.test: data are essentially constant

Still not quite clear why the for-loop works though. But note that in your for-loop you do as.numeric, which you don't do in the apply case...

Tommy
  • 39,997
  • 12
  • 90
  • 85
  • I agree that this behavior is also undesirable (but having Inf in the data is easier to spot than the harmlessly-looking data line causing my problem). It seems that t.test is not suitable for batch processing (or that some more precautions have to be made). BTW, the for-loop also fails for the problematic line but at least continues and deals with all other data lines – user900889 Feb 28 '12 at 21:28
  • @user900889 - Aha! See my updated answer for a possible solution. – Tommy Feb 28 '12 at 22:50
  • Why I didn't simply add TryCatch? Because I didn't know that it existed :-). Thanks! – user900889 Feb 29 '12 at 09:52
1

In situations like this, I catch all the warnings and errors and investigate them afterwards, as shown here: How do I save warnings and errors as output from a function?

You may also find some good ideas here: How to tell lapply to ignore an error and process the next thing in the list?

Community
  • 1
  • 1
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
0

t.test(rep(1,4),rep(0,4))would report the same error. And https://stat.ethz.ch/pipermail/r-help/2008-February/154167.html is the answer to deal with 1) zero-variance 2) observation number is not ≥ 1

Puriney
  • 2,417
  • 3
  • 20
  • 25
  • sorry, I mean probably it is the zero-variance that make the problem happen, i just raise a simple example for simple notification. I will answer in full – Puriney Jul 04 '13 at 03:23
0

I've run into exactly this problem, using tapply to run a large number of paired t-tests. I use sample size as a guide to remove data that I probably shouldn't be running t-tests on anyways.

Knikki53
  • 1
  • 1