0

I have a data.frame column that looks like this

       a$`1`
1   1316.100
2      0.200
3    131.600
4      0.500
5     49.600
6      0.500
7    183.100
8      0.200
9   1307.100
10     0.200
11     2.100
12    12.100
13     8.100
14   649.100
15     0.200
16   503.600
17     0.500
18   231.100
19     0.200
20  2527.600
21     0.500
22    90.100
23     0.200

The numbers before the decimals are seconds*100. The number after the decimal is an indicator value. I'm working on a fixed ratio choice task. In which, a correct choice (marked by ".100") or an incorrect choice (marked by ".600" and always followed by "0.500") can occur. If a correct choice is chosen and a reward is given (marked by the ".100" followed by "0.200") then a timeout period occurs for 10 seconds. During that time, choices can be made (marked by ".100" or ".600"/"0.500")but no reward is given until the timeout period is over.

The problem is that I have no indication of when a timeout occurred.

How can I add indicator values for the timeout period? For example, place a "0.700" value after every "0.200" value to mark the start of timeout. Then place a "0.800" value after 10 seconds to mark the end of a timeout.

Edit: I guess a for lop would come in handy here? I want to sum the values after every 0.200 value. Once the sum of the row values exceeds 10 seconds, place a 0.800 value and then continue on to the next 0.200 value. Finally, after every 0.200, place a 0.700. The 0.700 will mark a timeout start and the 0.800 will mark a timeout (10 second) stop. This is what my column should look like...

a$`1`
1316.100
0.200
0.700
131.600
0.500
49.600
0.500
0.800
183.100
0.200
0.700
0.800
1307.100
0.200
0.700
2.100
12.100
8.100
0.800
649.100
0.200
0.700
0.800
503.600
0.500
231.100
0.200
0.700
0.800
2527.600
0.500
90.100
0.200
jc2525
  • 141
  • 10
  • 1
    Is there any reason to keep these two different values as a decimal number inside one column? I strongly recommend you split the values into the seconds part and the flag part. Otherwise you are likely to run into problems trying to exactly match decimal values which isn't always easy with floating point numbers. You should provide your example data in a [reproducible format](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and also provide the desired output so possible solutions can be tested and verified. – MrFlick Sep 23 '19 at 17:31
  • Thanks, I added an edit for what it should look like. I have a bunch of columns and rows to go through and that is why I was hoping for an easier way to do this. I would like the decimal flags to be left (or at least added back in at the end) to indicate a correct or incorrect response. Also, i would add my raw text file but i don't know how to do that. So, I just gave an example column. – jc2525 Sep 23 '19 at 17:57
  • 1
    Can you describe a little more about your ultimate goal / final desired output here? The "place a 0.800 value" step sounds like a potential XY problem, ie there might be a more straightforward way.(https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem) – Jon Spring Sep 23 '19 at 18:05
  • 1
    Agreed with MrFlick, it's generally against best practices to store multiple types of data in a single column. If you need to preserve the original column for backwards compatibility, that's fine, but for your analysis, it makes a lot of sense to split it into a `Time` column, and a `Response` column. That way you don't have to play games of referencing the first half and the second half of a double/string to do your analysis/manipulation. – Mako212 Sep 23 '19 at 18:21
  • Okay thanks, I can try that. Is there a for loop (or something) I can write (at least as a start) to read the rows and insert a 0.700 value after every 0.200 value? – jc2525 Sep 23 '19 at 19:53
  • It seems like `.200` is already your indicator. What can you accomplish by inserting `.700` rows that you can't with the `.200` rows? – Mako212 Sep 23 '19 at 21:11

1 Answers1

0

Because your question is still a little diffuse, I'm going to attempt to show you how I think you might go about it, in a more R-like fashion.

Here is a starting data frame from what you provided:

library(tidyverse)
library(stringr)

df1 <- structure(list(V1 = 1:23, V2 = c(1316.1, 0.2, 131.6, 0.5, 49.6, 
0.5, 183.1, 0.2, 1307.1, 0.2, 2.1, 12.1, 8.1, 649.1, 0.2, 503.6, 
0.5, 231.1, 0.2, 2527.6, 0.5, 90.1, 0.2)), class = "data.frame", row.names = c(NA, 
-23L))

First, let's tag every timeout start, and then create a grouping variable that groups by each time we start a new timeout:

df1 <- df1 %>% mutate(V3 = ifelse(grepl("\\.2$", V2), TRUE, FALSE), 
    TimeOutGroup = cumsum(V3)) 

Next, let's pull the seconds component out of V2 into its own column, and calculate time elapsed within each group.

df1 <- df1 %>% group_by(TimeOutGroup ) %>% 
    mutate(Seconds = as.numeric(gsub("(^\\d+)?\\..*", "\\1",V2))/100, 
    Time_Elapsed = cumsum(Seconds))

Givings us the below:

# A tibble: 23 x 6
# Groups:   TimeOutGroup [7]
      V1     V2 V3    TimeOutGroup Seconds Time_Elapsed
   <int>  <dbl> <lgl>        <int>   <dbl>        <dbl>
 1     1 1316.  FALSE            0   13.2         13.2 
 2     2    0.2 TRUE             1    0            0   
 3     3  132.  FALSE            1    1.31         1.31
 4     4    0.5 FALSE            1    0            1.31
 5     5   49.6 FALSE            1    0.49         1.8 
 6     6    0.5 FALSE            1    0            1.8 
 7     7  183.  FALSE            1    1.83         3.63
 8     8    0.2 TRUE             2    0            0   
 9     9 1307.  FALSE            2   13.1         13.1 
10    10    0.2 TRUE             3    0            0   

Now, we need to think about what our condition for the End of the Time Out is, and whether it really needs to be its own row.

We could define End as being any row where total elapsed time since start > 10, or where the next value is NA (meaning no subsequent values until a new Start). And then we can add Start into this column too:

df1 %>% mutate(Status = ifelse(Time_Elapsed > 10 | is.na(lead(Time_Elapsed)), "End", ""), 
    Status = ifelse(V3 == TRUE, "Start", Status))

This gives us a table like this, where we've tagged Start and End, without having to add any rows:

# A tibble: 23 x 7
# Groups:   TimeOutGroup [7]
      V1     V2 V3    TimeOutGroup Seconds Time_Elapsed Status
   <int>  <dbl> <lgl>        <int>   <dbl>        <dbl> <chr> 
 1     1 1316.  FALSE            0   13.2         13.2  End   
 2     2    0.2 TRUE             1    0            0    Start 
 3     3  132.  FALSE            1    1.31         1.31 ""    
 4     4    0.5 FALSE            1    0            1.31 ""    
 5     5   49.6 FALSE            1    0.49         1.8  ""    
 6     6    0.5 FALSE            1    0            1.8  ""    
 7     7  183.  FALSE            1    1.83         3.63 End   
 8     8    0.2 TRUE             2    0            0    Start 
 9     9 1307.  FALSE            2   13.1         13.1  End   
10    10    0.2 TRUE             3    0            0    Start 

Hopefully that's helpful!

Mako212
  • 6,787
  • 1
  • 18
  • 37