0

I am trying to subset a data frame based on a range of time. Someone has asked this question in the past and the answer was to use R CMD INSTALL lubridate_1.3.1.tar.gz (see link: subset rows according to a range of time.

The issue with this answer is that I get the following warning:

> install.packages("lubridate_1.3.2.tar.gz")
Warning in install.packages :
  package ‘lubridate_1.3.2.tar.gz’ is not available (for R version 3.1.2)

I am looking for something very similar to this answer but I cannot figure out how to do this. I have a MasterTable with all of my data organized into columns. One of my columns is called maxNormalizedRFU.

My question is simple: How can I subset my maxNormalizedRFU column by time?

I would simply like to add another column which only displays the maxNormalizedRFU the data between 10 hours and 14 hours. Here is what I have up to now:

#Creates the master table
MasterTable <- inner_join(LongRFU, LongOD, by= c("Time.h", "Well", "Conc.nM", "Assay")) 
#normalizes my data by fluorescence (RFU) and optical density (OD) based on 6 different subsets called "Assay"
MasterTable$NormalizedRFU <- MasterTable$AvgRFU/MasterTable$AvgOD 
#creates a column that only picks the maximum value of each "Assay"
MasterTable <- ddply(MasterTable, .(Conc.nM, Assay), transform, maxNormalizedRFU=max(NormalizedRFU)) 
#The issue
MasterTable$CutmaxNormalizedRFU <- ddply(maxNormalizedRFU, "Time.h", transform, [MasterTable$Time.h < 23.00 & MasterTable$Time.h > 10.00,])

Attached is a sample of my dataset. Since the original file has over 90 000 lines, I have only attached a small fraction of it (only one assay and one concentration).

My line is currently using ddply to do the subset but this simply does not work. Does anyone have a suggestion as to how to fix this issue?

Thank you in advance!

Marty

Community
  • 1
  • 1
Marty999
  • 213
  • 1
  • 4
  • 12

1 Answers1

1

I downloaded your data and had a look. If I am not mistaken, all you need is to subset data using Time.h. Here you have a range of time (10-23) you want. I used dplyr and did the following. You are asking R to pick up rows which have values between 10 and 23 in Time.h. Your data frame is called mydf here.

library(dplyr)
filter(mydf, between(Time.h, 10, 23))
jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • @Marty999 90,000 lines are not that large. Could I ask what you mean by saying the time frame repeats itself over and over? – jazzurro Jan 28 '15 at 02:47
  • Hi Jazzurro, thank you very much for the reply. I have tried this method and I cannot seem to get it to work. I think that the issue is that the time frame repeats itself over and over (the original one). Since this is in a long table, as soon as 24 hours goes by, the table starts its next "Assay". Its hard to have a function cut the data as it needs to actively seek all of the data which matches the 10-23 hour criteria... If you take a look at the before last line in my post, do you think that it is possible to add a criteria to it where it only picks values within a given time frame...? – Marty999 Jan 28 '15 at 02:52
  • Sorry that was not very clear. I have 6 different experiments running in a 96 well plate. Therefore, all of the data it taken over the same 24 hour period, but it was divided into 6 subsets (in my table). All of the data is one bellow the other. When one of the 24 hours subsets ends, the other one is placed bellow it. Does this make sense? – Marty999 Jan 28 '15 at 02:55
  • @Marty999 I am trying to understand what you described. It seems that you have possibly 6 levels in `Assay`. Is that right? You want to subset data for each `Assay` with that 10-23 criterion. Is that right? You have a long-format data frame by binding the six subsets. Is that right? In the worst case, could you upload the while data? It may be difficult to see what is going on. – jazzurro Jan 28 '15 at 03:04
  • yes, I think you understand. I have added a comment to my post with a new data set which combines two assays and all of the columns. In that table, the subsets are called "Assay". To give you an idea, here is a plot that I wish to achieve : https://app.box.com/s/zefii3hf6drkgb1yziv73zkhblt7ljbq again, thank you for your help in the matter! – Marty999 Jan 28 '15 at 03:18
  • @Marty999 Thanks for the data. `filter(mydf, between(Time.h, 10, 23))` still gets all rows which have values between 10 and 23 on my machine. If you want to produce the graphic, you will use `Assay` for facet in `ggplot`. I am not sure what is not working for you. Would you please further clarify your point? – jazzurro Jan 28 '15 at 03:29
  • I am not sure why I cannot get it to work on my machine? I get the following error: `Error in filter_(.data, .dots = lazyeval::lazy_dots(...)) : object 'mydf' not found`. Any thoughts? – Marty999 Jan 28 '15 at 03:37
  • @Marty999 Could you change `mydf` to your data frame name? If you have an object called `MasterTable2` as your data frame, `mydf` should be `MasterTable2`. On your machine, you do not have an object called `mydf``. Once you change this part, you should see the expected outcome. Let me know how it goes. – jazzurro Jan 28 '15 at 03:41
  • Wow you are absolutely right! Well done and thank you very much for helping me debug this! How do I give you gold?? :) – Marty999 Jan 28 '15 at 04:03
  • @Marty999 That's all good. Now you can move onto your graphic work. Good luck! :) – jazzurro Jan 28 '15 at 04:05