R- for all the duplicates of the same day

Question

I have a database where every day is repeated several times, so there are several rows for the same date. (Btw, I use the package lubridate).

What I want to do is :

Create a column T1 and a column T2 of the first lowest and first highest price value. T1 would return blank/NA cells except for the rows where it finds the first highest and lowest prices. However, and this is where I am stuck, I want it to consider the duplicates together. So it will be like a loop : for the first set of duplicates, find T1 and T2, then move to the second set of duplicates, etc....

newdf4<-Data %>%
mutate(T1= max(which(settle < 120)))%>%
mutate(T2=min(which(settle> 120)))

Here is how my data looks like:

 <date>     <dttm>               <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>  <lgl> 
1 2002-01-02 2002-01-10 00:00:00   118   125   125  125.  125.    55 NA     NA    
2 2002-01-02 2002-03-11 00:00:00   125     NA    NA   NA    NA      0 NA     NA    
3 2002-01-02 2002-05-10 00:00:00   128    NA    NA   NA    NA      0 NA     NA    
4 2002-01-02 2002-07-10 00:00:00   127     NA    NA   NA    NA      0 NA     NA    
5 2002-01-02 2002-09-10 00:00:00   130     NA    NA   NA    NA      0 NA     NA    
6 2002-01-02 2002-11-11 00:00:00   180    120   120  120   120      5 NA     NA

Thanks a lot in advance.

EDIT :

 dput(head(Data))
 structure(list(Date = structure(c(11689, 11689, 11689, 11689, 
 11689, 11689), class = "Date"), Echeance = structure(c(1010620800, 
 1015804800, 1020988800, 1026259200, 1031616000, 1036972800), class =     
 c("POSIXct", "POSIXt"), tzone = "UTC"), Settle = c(118, 125, 128, 127, 
 130, 180), Open = c(125, NA, NA, NA, NA, 120), Haut = c(125, 
 NA, NA, NA, NA, 120), Bas = c(124.75, NA, NA, NA, NA, 120), Close =     
 c(124.75, NA, NA, NA, NA, 120), Vol_Q = c(55, 0, 0, 0, 0, 5), Bloc_Q = c(NA, 
 NA, NA, NA, NA, NA), Trades = c(NA, NA, NA, NA, NA, NA), `Vol_€` =     
 c(343062.5, 
 0, 0, 0, 0, 30000), O.I. = c(908, 3645, 1603, 100, 157, 1210)), row.names =          
 c(NA,-6L), class = c("tbl_df", "tbl", "data.frame"))

"separately for each day of the same date"? So, you have a specific date with different days? How is this possible? Please provide the example data using `dput` and your expected outcome. — AntoniosK, Nov 15 '18 at 13:17
@AntoniosK Sorry if I didn't express myself correctly but as you can see in the head of my data posted, for the first column, there are many rows of the same day, january the 2nd, 2002. This is what I mean by same day. — Narjems, Nov 15 '18 at 13:18
For your desired output, could you actually give the expected output, not just "what is T1"? The data you shared doesn't have a `price` column - do you want to use the `Settle` column? `Settle` doesn't have any values less than 120 in your sample data, so what is T1? Since the first value is 124.75, does that mean T2 is 124.75? And also want to clarify that your desired output has just one row per date, right? — Gregor Thomas, Nov 15 '18 at 14:49
@Gregor, thank you for taking the time. I have modified my question and I hope this time it is clear. As for the price, it is the settle yes. I just tried to simplified the code. — Narjems, Nov 15 '18 at 15:43
**Can you please show the values of T1 and T2 you expect for your sample data?** Seeing your expected result makes it very clear if we understand your logic. By "set of duplicates" do you mean "each day"? Or do you mean the duplicated prices, because after your edits there are now many duplicated prices. You don't say it explicitly, but you seem to be wanting these columns added to the original data, not in a new data frame with one row per day? But if you just show your expected result, these questions will all be made clear. — Gregor Thomas, Nov 15 '18 at 15:47
@Gregor, I haved edited again my answer, and changed the settle prices such that it is clearer. I want T1 and T2 to be in the new data frame `newdf4` where for T1, the first row would be 118, since it is the one smaller, and for T2, the 2nd row would be 125. The other rows of the day for T1 and T2 would be NAs. — Narjems, Nov 15 '18 at 15:59

Gregor Thomas · Accepted Answer · 2018-11-15T19:21:47.960

1

I'd go like this. Adjust the select to include whatever columns you want.

Data %>%
  group_by(Date)  %>%
  mutate(t1_cond = Settle < 120,
         t2_cond = Settle > 120,
         T1 = if_else(row_number() == which.max(t1_cond) & t1_cond, Settle, NA_real_),
         T2 = if_else(row_number() == which.max(t2_cond) & t2_cond, Settle, NA_real_)) %>%
  select(Date, T1, T2)

# # A tibble: 6 x 3
# # Groups:   Date [1]
#   Date          T1    T2
#   <date>     <dbl> <dbl>
# 1 2002-01-02   118    NA
# 2 2002-01-02    NA   125
# 3 2002-01-02    NA    NA
# 4 2002-01-02    NA    NA
# 5 2002-01-02    NA    NA
# 6 2002-01-02    NA    NA

edited Nov 15 '18 at 19:21

answered Nov 15 '18 at 16:09

Gregor Thomas

136,190
20
167
294

Thank you @Gregor for the help. I get an error message : 'Error in mutate_impl(.data, dots) : Column `t1_cond` must be length 8 (the group size) or one, not 39983' . Do you know how to deal with it? – Narjems Nov 15 '18 at 19:12
I can't reproduce that problem, even when I added another row for another date. Can you verify that my code works as expected on the sample data you provided? If not, then we can debug that. If so, then you need to provide sample data that reproduces the problem you're now facing. – Gregor Thomas Nov 15 '18 at 19:17
It sounds a little bit like an error you might get if you loaded `plyr` after `dplyr` and you were using the wrong version of `mutate`... [see the FAQ for details](https://stackoverflow.com/q/26106146/903061). – Gregor Thomas Nov 15 '18 at 19:19
Adding another row did show me a bug where if all the values are above 125 then T1 will be the first value (or vice versa). Edited my answer to fix. – Gregor Thomas Nov 15 '18 at 19:21

R- for all the duplicates of the same day

1 Answers1