Dataset
> read.delim("df.tsv")
col1 col2 group
1 3 2 aa
2 1 1 aa
3 4 1 aa
4 4 3 aa
5 5 3 ab
6 3 2 ab
7 4 1 ab
8 2 4 ab
9 4 2 ba
10 1 4 ba
11 3 1 ba
12 4 3 ba
13 4 2 bb
14 2 3 bb
15 3 1 bb
16 1 2 bb
I want to sort the columns col1 and col2 within each of the 4 groups, in the following way:
- If the 1st character in the group name is "a", sort col1 in a descending manner, and ascending if it's "b"
- If the 2nd character in the group name is "a", sort col2 in a descending manner, and ascending if it's "b"
- Importantly, I would like that both columns are sorted simultaneously, i.e. if e.g. the group is "aa", the sorting for that group should look like this:
col1 col2 group
1 4 3 aa
2 3 2 aa
3 4 1 aa
4 1 1 aa
...
This could be accomplished by e.g. a "one row at a time" approach, first col1 and then col2, alternating for each row.
Current code and output
library(dplyr)
read.delim("df.tsv") %>%
group_by(group) %>%
arrange(ifelse(substr(group, 1,1) == "a", desc(col1), col1), # if first character in group name is "a", sort col1 in a descending manner, and ascending if it's "b"
ifelse(substr(group, 2,2) == "a", desc(col2), col2), # if second character in group name is also "a", sort also col2 in a descending manner, and ascending if it's "b"
.by_group = TRUE)
col1 col2 group
1 4 3 aa
2 4 1 aa
3 3 2 aa
4 1 1 aa
5 5 3 ab
6 4 1 ab
7 3 2 ab
8 2 4 ab
9 1 4 ba
10 3 1 ba
11 4 3 ba
12 4 2 ba
13 1 2 bb
14 2 3 bb
15 3 1 bb
16 4 2 bb
However, this does not fulfill the 3rd criterion, the "simultaneous sorting one row at a time".
Desired output
col1 col2 group
1 4 3 aa
2 3 2 aa
3 4 1 aa
4 1 1 aa
5 5 3 ab
6 4 1 ab
7 3 2 ab
8 2 4 ab
9 1 4 ba
10 4 3 ba
11 3 1 ba
12 4 2 ba
13 1 2 bb
14 3 1 bb
15 2 3 bb
16 4 2 bb
EDIT
There are a couple of answers that actually do the proposed task, so I think a tie-breaker could be that the algorithm is flexible with respect to the number of columns to sort, e.g. 3:
col1 col2 col3 group
3 2 4 aaa
1 1 2 aaa
4 1 4 aaa
4 3 1 aaa
5 3 3 aab
3 2 2 aab
4 1 1 aab
2 4 1 aab
4 2 3 aba
1 4 3 aba
3 1 2 aba
4 3 3 aba
3 2 4 abb
1 1 2 abb
4 1 4 abb
4 3 1 abb
4 2 1 baa
2 3 2 baa
3 1 2 baa
1 2 1 baa
5 3 3 bab
3 2 2 bab
4 1 1 bab
2 4 1 bab
4 2 3 bba
1 4 3 bba
3 1 2 bba
4 3 3 bba
4 2 1 bbb
2 3 2 bbb
3 1 2 bbb
1 2 1 bbb
The output should be
col1 col2 col3 group
4 3 1 aaa
3 2 4 aaa
4 1 4 aaa
1 1 2 aaa
5 3 3 aab
2 4 1 aab
4 1 1 aab
3 2 2 aab
4 2 3 aba
3 1 2 aba
4 3 3 aba
1 4 3 aba
4 1 4 abb
1 1 2 abb
4 3 1 abb
3 2 4 abb
1 2 1 baa
2 3 2 baa
3 1 2 baa
4 2 1 baa
2 4 1 bab
5 3 3 bab
4 1 1 bab
3 2 2 bab
1 4 3 bba
3 1 2 bba
4 2 3 bba
4 3 3 bba
1 2 1 bbb
3 1 2 bbb
4 2 1 bbb
2 3 2 bbb
Currently the 2 suggested solutions do not work when 3 or more columns are included, they sort based on only 2 columns.
EDIT 2
If e.g. group=='aba', the first row of this group should be the one that includes the highest value in col1; the 2nd row the one that includes the (remaining) lowest value in col2; the 3rd row the one that includes the (remaining) highest value in col3, and the 4th row is the remaining row. However, this should be flexible to allow for more than 4 rows per group, in that case the 4th row should be the one that includes the (remaining) highest value in col1; the 5th row should be the one that includes the (remaining) lowest value in col2; etc.
More details
Example: For the 2nd row of the 'aba' group, in the case that there is a tie between 2 rows for the lowest (remaining) value in col2, e.g.
row-a 3 1 4 aba
row-b 2 1 4 aba
(notice that there is a 1 in col2 in both rows), ideally then the chosen 2nd row would be row-a, since the col1 has to be sorted in a descending manner ('a') in this group, and 3>2, and for col3 4==4 anyway.
If instead
row-a 3 1 4 aba
row-b 2 1 5 aba
let the priority go col3>col2>col1, since the cycle goes col1>col2>col3... so the 2nd row would be row-b, since 5>4.
So to generalize, if there were 5 columns and the group were 'aabaa', and there is a tie for choosing the 3rd row between 2 rows:
row-a 3 2 1 3 3 aabaa
row-b 5 4 1 4 2 aabaa
(col3 == 1 in both), then the one to select would be row-a since for col5 3>2. If instead
row-a--> 3 2 1 3 3
row-b--> 5 4 1 4 3
(col5==3 in both), then choose row-b since for col4 4>3.