0

This is a similar question to reordering groups with dataframe, however differs in that there are more than two variables. Example data:

raw <- "Date          Response     ZNumber     Latency    ZPV
        2016-05-04    1            1           445.562    59.666
        2016-05-04    2            1           433.890    97.285
        2016-05-04    3            1           372.073    53.994
        2016-05-04    4            1           282.337    89.686
        2016-05-04    4            2           333.186    57.471
        2016-05-04    5            1           320.500    71.968
        2016-05-04    5            2           280.818    49.187
        2016-07-14    1            1           411.849    65.539
        2016-07-14    2            1           346.814    50.626"
data <- read.table(text=raw, header = TRUE)

Individual 'Date-Response-ZNumber' and 'Latency-ZPV' is always correctly associated. The ZNumber order per Date-Response should be defined by ascending order of Latency.

The problem in my data is that sometimes when a Date-Response has more than one ZNumber, the Latency order sometimes does not match the ZNumber order e.g. Date=2016-05-04, Response=4 has ascending order in both ZNumber and Latency whereas Date=2016-05-04, Response=5 the ZNumber is ascending while Latency is descending.

I cannot discover the correct split-apply-combine operations.


Output

What I would like performed is both ZNumber and Latency to ascend together within a 'Date-Response' group e.g. Date=2016-05-04, Response=5

"Date          Response     ZNumber     Latency    ZPV
2016-05-04    1            1           445.562    59.666
2016-05-04    2            1           433.890    97.285
2016-05-04    3            1           372.073    53.994
2016-05-04    4            1           282.337    89.686
2016-05-04    4            2           333.186    57.471
2016-05-04    5            1           280.818    49.187
2016-05-04    5            2           320.500    71.968
2016-07-14    1            1           411.849    65.539
2016-07-14    2            1           346.814    50.626"

dplyr

Numerous attempts to solve, such as below, have not worked...

library(dplyr)

data <- data %>%
group_by(Date, Response) %>%
arrange(Latency, ZNumber) %>% 
arrange(Date, Response)

or, as suggested in the above linked question...

data <- data %>%
arrange(df, group, desc(value))

with the various 'mutating joins' without success. e.g.

data <- data %>%
  group_by(Date,Response) %>%
  select(Latency) %>%
  arrange(Latency) %>% 
  arrange(Response) %>%
  full_join(data,by=c("Date","Response"))

however now has two Latency columns.


sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
other attached packages:
[1] dplyr_0.5.0
loaded via a namespace (and not attached):
[1] lazyeval_0.2.0 magrittr_1.5   R6_2.2.0       assertthat_0.1 DBI_0.5-1     
[6] tools_3.3.2    tibble_1.2     Rcpp_0.12.8 
Community
  • 1
  • 1
  • That's just the data. Order does not manipulate data points. For *Response* #5, *ZNumber* will always ascend and *Latency* descend. Sort one and the other counters. Don't believe me? Try showing your desired output so we can see if you are asking for data cell changes. – Parfait Jan 25 '17 at 03:37
  • Oops, yes - edited with output desired. Also clarified operations attempted. – statoconial membrane Jan 25 '17 at 04:37

1 Answers1

0

I have solution with data.table that works nice and easy with minimal scripting

raw <- "Date          Response     ZNumber     Latency    ZPV
        2016-05-04    1            1           445.562    59.666
2016-05-04    2            1           433.890    97.285
2016-05-04    3            1           372.073    53.994
2016-05-04    4            1           282.337    89.686
2016-05-04    4            2           333.186    57.471
2016-05-04    5            1           320.500    71.968
2016-05-04    5            2           280.818    49.187
2016-07-14    1            1           411.849    65.539
2016-07-14    2            1           346.814    50.626"
data <- read.table(text=raw, header = TRUE)
library(data.table)
data <- data.table(data)
data <- data[order(as.numeric(Latency))]
data[,new_ZNumber:=1:length(Latency),by=.(Date,Response)]
data <- data[order(Date,as.numeric(Response),as.numeric(Latency))]
data

Output :

         Date Response ZNumber Latency    ZPV new_ZNumber
1: 2016-05-04        1       1 445.562 59.666           1
2: 2016-05-04        2       1 433.890 97.285           1
3: 2016-05-04        3       1 372.073 53.994           1
4: 2016-05-04        4       1 282.337 89.686           1
5: 2016-05-04        4       2 333.186 57.471           2
6: 2016-05-04        5       2 280.818 49.187           1
7: 2016-05-04        5       1 320.500 71.968           2
8: 2016-07-14        1       1 411.849 65.539           1
9: 2016-07-14        2       1 346.814 50.626           1

Not sure why ddply isn't doing what you want but let me know if this is what you had in mind.

EDIT : Added rebuilt ZNumber called new_ZNumber per OP's request.

  • OP is asking: *so that both ZNumber and Latency are ascending together with a 'Date-Response' group*. For Response #5 in your solution, *ZNumber* descends. OP is forcing data in an order that's not possible for all records. – Parfait Jan 25 '17 at 03:40
  • yeah i saw your comment on the OPs question, you are absolutely right. But there is also nothing that can be done about it for the reasons you pointed out – JustGettinStarted Jan 25 '17 at 04:00
  • As pointed out by @Parfait, in Response group #5 both Latency and ZNumber are not ascending... – statoconial membrane Jan 25 '17 at 04:43
  • yes but if you read @Parfait closely he explained that this 'ascending' is mandatory. You cannot change it without 'manipulating' data to swap 'Znumber' – JustGettinStarted Jan 25 '17 at 04:48
  • In your original data at Response 5 ZNumber 1 has Latency of 320.500 and ZNumber 2 has Latency of 280.818. In your output they are reversed so that ZNumber 1 has Latency 280.818 and ZNumber 2 has Latency of 320.500. Which can only be achieved through manipulation. – JustGettinStarted Jan 25 '17 at 05:10
  • Yes. That is the question - what are those manipulations operations, and what order is it applied. [This](http://stackoverflow.com/questions/1296646/how-to-sort-a-dataframe-by-columns) has been somewhat useful, but has not yet solved the problem. – statoconial membrane Jan 25 '17 at 05:32
  • by manipulations we mean data compromising 'dishonest' data manipulation. You are basically actively changing ZNumber making it not useful, and it it is not not useful you should not be considering it in any analysis. – JustGettinStarted Jan 25 '17 at 05:35
  • Ahh okay, yeah I am actively changing it... So then, I guess ZNumber should be rebuilt using Latency by each Date-Response group? – statoconial membrane Jan 25 '17 at 05:42
  • If it works the way you want it to, please click the check mark beside the answer above so that this question can be closed. – JustGettinStarted Jan 25 '17 at 05:54