0

I'm attempting to do a computation by Id group. would like to use dplyr, but not necessary. In the history column,i have a string of numbers (all same length, 36). I want to apply the rule, get the largest (max) value, element by element, and get out a new single history, for each Id. For example, for ID = 1157, the new, single string, would be 432400000000000000000000000000000000, as those are the largest values for each element, for that ID. I would like to do this for all Ids (thousands of them).

     Id                              history
1  1157 101000000000000000000000000000000000
2  1157 000000000000000000000000000000000000
3  1157 432100000000000000000000000000000000
4  1157 321000000000000000000000000000000000
5  1157 000400000000000000000000000000000000
6  1157 432100000000000000000000000000000000
7  1157 211000000000000000000000000000000000
26 1351 000000000000000000000000000000000000
27 1351 000000000000000000000000000000000000
45 1351 000000000000000000000000000000000000
46 1351 000000000000000000000000000000000000
47 1351 000000000000000000000000000000000000
48 1351 000000000000000000000000000000000000
49 1351 000000000000000000000000000000000000
50 1351 000000000000000000000000000000000000
51 1351 000000000000000000000000000000000000
52 1351 000000000000000000000000000000000000
53 1351 000000000000000000000000000000000000
54 1351 000000000000000000000000000000000000
55 1351 000000000000000000000000000000000000
nerdlyfe
  • 487
  • 7
  • 21
  • 1
    Isn't this just max per group? `df %>% group_by(Id) %>% slice(which.max(as.numeric(history)))` ? – Ronak Shah Jul 16 '19 at 04:07
  • These two seems to answer your question:- [1]https://stackoverflow.com/questions/24070714/extract-row-corresponding-to-minimum-value-of-a-variable-by-group [1]: https://stackoverflow.com/questions/24558328/how-to-select-the-row-with-the-maximum-value-in-each-group – Gaurav Deval Jul 16 '19 at 04:07
  • i need it element by element, not which row is max. that example is a bit decieving. – nerdlyfe Jul 16 '19 at 04:13
  • 1
    ahh..I see. Can you update the post with a better example so that the difference is clear ? Also it would be helpful if you could provide `dput` of your sample data. – Ronak Shah Jul 16 '19 at 04:16
  • Do you mean, `df %>% group_by(Id) %>% mutate(new_string = max(history))`? This assign the max value of column `history` in the group to each `Id` element – gavg712 Jul 16 '19 at 04:59

1 Answers1

1

We can split every history value on each character and create a list column and group_by Id and use pmax to get element with maximum value at each position.

library(dplyr)
library(purrr)

df %>%
  mutate(new_col = map(history, ~strsplit(., "")[[1L]])) %>%
  group_by(Id) %>%
  summarise(temp = paste0(Reduce(pmax, new_col), collapse = ""))

#  Id    temp                                
# <int> <chr>                               
#1 1157  432400000000000000000000000000000000
#2 1351  000000000000000000000000000000000000

strsplit creates a list of characters and since we are using map it creates another list hence, the output becomes a nested list which we avoid by using [[1L]] so output of strsplit is a character vector instead of list.

new_col however is a list column and using Reduce we compare all the new_col values in the group (Id) and select the elements with max value element by element using pmax.

Another thing to note here is we have new_col as list of character vectors which means 1 is "1", 2 is "2" and so on. Ideally new_col should have been a list of integer vectors for comparison purposes but here I think it would not matter because we are doing element-wise comparison and it would yield the same result as normal integer comparison. To test a few

"2" > "1"
#[1] TRUE
"6" < "1"
#[1] FALSE

Using the same logic in base R, this would be

stack(lapply(split(strsplit(df$history, ""), df$Id), function(x) 
              paste0(Reduce(pmax, x), collapse = "")))

#                                values  ind
#1 432400000000000000000000000000000000 1157
#2 000000000000000000000000000000000000 1351

data

df <- structure(list(Id = c(1157L, 1157L, 1157L, 1157L, 1157L, 1157L, 
1157L, 1351L, 1351L, 1351L, 1351L, 1351L, 1351L, 1351L, 1351L, 
1351L, 1351L, 1351L, 1351L, 1351L), history = 
c("101000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"432100000000000000000000000000000000", 
"321000000000000000000000000000000000", 
"000400000000000000000000000000000000", 
"432100000000000000000000000000000000", 
"211000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
 "000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000", 
"000000000000000000000000000000000000")), row.names = c("1", 
"2", "3", "4", "5", "6", "7", "26", "27", "45", "46", "47", "48", 
"49", "50", "51", "52", "53", "54", "55"), class = "data.frame")
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • that was pretty epic. can you perhaps explain some of the complexity? Specifically, this map(history, ~strsplit(., "")[[1L]])) and the "reduce". the code worked very well. thx u! – nerdlyfe Jul 16 '19 at 05:22
  • 1
    @ElChapo Added some explanation. Hope you find it useful. – Ronak Shah Jul 16 '19 at 05:37