How to handle more than multiple sets of data in R programming?

Question

Ca data <- cut(data$Time, breaks=seq(0, max(data$Time)+400, 400))  by(data$Oxytocin, cuts, mean)

but this would only work for only one person's data....But I have ten people with their own Time and oxytocin data....How would I get their averages simultaneously? Also instead of having this type output :

cuts: (0,400]
[1] 0.7
------------------------------------------------------------ 
cuts: (400,800]
[1] 0.805

Is there a way I can get a list of those cuts?

Could you provide an example of your data structure? How are you storing it in R? Knowing that, I think I can help with a simple solution. — Oscar de León, Feb 20 '13 at 21:42
I'm just using ....data=read.delim("clipboard")...the data is from an excel spreadsheet — Marco De Niro, Feb 21 '13 at 03:07
Ok, I get it. I need a precise idea of the data structure. Could you run `dump(head(data, 10), "")` and paste output here? Is it possible to share a bit of the data? That would help a lot. If there are confidentiality issues, maybe you could multiply the numbers by random values first. — Oscar de León, Feb 21 '13 at 03:39
I posted a new question about this coz my data format is making things too hard for me....can u check this link and see if it makes any sense to you...http://stackoverflow.com/questions/14994084/multiple-columns-of-data-and-getting-average-r-program — Marco De Niro, Feb 21 '13 at 03:49

score 1 · Accepted Answer · answered Feb 20 '13 at 15:52

1

Here's a solution using IRanges package.

idx assumes your data format is Time, data, Time, data, ... and so on.. So, it creates indices 1,3,5,...ncol(df)-1.

ir1 is the intervals you would want the mean for. It's width is 400. It goes from 0 to max(Time) for each Time column (here columns 1 and 3).

ir2 is the corresponding Time column of interval width = 1.

Then I get the overlaps of ir1 with ir2, which basically tells me which intervals from ir2 overlap with ir1 (which we want), from which I calculate the mean and output the data.frame.

idx <- seq(1, ncol(df), by=2)
o <- lapply(idx, function(i) {  
    ir1 <- IRanges(start=seq(0, max(df[[i]]), by=401), width=401)
    ir2 <- IRanges(start=df[[i]], width=1)
    t <- findOverlaps(ir1, ir2)
    d <- data.frame(mean=tapply(df[[i+1]], queryHits(t), mean))
    cbind(as.data.frame(ir1), d)
})

> o
# [[1]]
#   start  end width      mean
# 1     0  400   401 0.6750000
# 2   401  801   401 0.8050000
# 3   802 1202   401 0.8750000
# 4  1203 1603   401 0.2285333

# [[2]]
#   start  end width    mean
# 1     0  400   401 0.73508
# 2   401  801   401 0.13408
# 3   802 1202   401 0.26408
# 4  1203 1603   401 1.06408
# 5  1604 2004   401 3.06408

For each Time column, you'll get a list with the intervals and mean for that interval.

answered Feb 20 '13 at 15:52

Arun

116,683
26
284
387

Thanks for the answer Arun, but I'm getting an error like this :Error in as.data.frame(ir1) : object 'ir1' not found – Marco De Niro Feb 21 '13 at 05:48
For this data or for another one of your data? – Arun Feb 21 '13 at 06:55
Do you have `IRanges` package installed and loaded using `library(IRanges)`? – Arun Feb 21 '13 at 07:01
For both of them....I tried installing it and I got this warning message...package ‘IRanges’ is not available (for R version 2.15.1) – Marco De Niro Feb 21 '13 at 09:32
Sorry I should have mentioned it. Check here on [**how to install**](http://www.bioconductor.org/packages/2.11/bioc/html/IRanges.html). Its part of the bioconductor package. – Arun Feb 21 '13 at 09:58
hey arun...I got this error when I used the first line of the code...Error in seq.default(1, ncol(df), by = 2) : 'to' must be of length 1....what does it mean? Also just to clarify...df means degrees of freedom? – Marco De Niro Feb 22 '13 at 03:25
Nope, `df` is the name of your `data.frame` (`data` in your case I suppose). – Arun Feb 22 '13 at 07:05

How to handle more than multiple sets of data in R programming?

1 Answers1

Linked