1

I'd like find max values for each row in a column with comma-separated numbers.

Input:

A   4,6
B   4,5
C   4,4,3,4

Output:

A   4,6 6
B   4,5 5
C   4,4,3,4 4
user2904120
  • 416
  • 1
  • 4
  • 18
  • 2
    And what have you tried? There are many popular questions here covering splitting comma values to separate columns/values - e.g: http://stackoverflow.com/questions/4350440/split-a-column-of-a-data-frame-to-multiple-columns which would be a first step I imagine. – thelatemail Jan 30 '17 at 22:56
  • I've tried str_split_fixed() and max(), just want to combine it in an elegant way – user2904120 Jan 30 '17 at 23:01
  • 3
    You could do `do.call(pmax, c(na.rm = TRUE, read.table(text = x, sep = ",", fill = TRUE)))` if `x` is that column of yours – David Arenburg Jan 30 '17 at 23:01
  • Plus 1 for @DavidArenburg pmax/min it's ridiculously faster than `apply(x,1,max)` – Brandon Bertelsen Jan 30 '17 at 23:04
  • Error in textConnection(text, encoding = "UTF-8") : invalid 'text' argument – user2904120 Jan 30 '17 at 23:08
  • @user2904120 - wrap it like `text=as.character(x)` - you probably have a `factor` – thelatemail Jan 30 '17 at 23:09
  • 1
    `sapply(strsplit(as.character(dat$V2),","), function(x) max(as.numeric(x)) )` seems a **lot** quicker than `read.table` - about 0.75 seconds vs 50 seconds for a 300K row `dat`. Might be worth noting if you are dealing with bigger data. – thelatemail Jan 30 '17 at 23:39

1 Answers1

0

Here is an option using tidyverse

library(dplyr)
library(tidyr)
separate_rows(df1, V2) %>%
        group_by(V1) %>%
        summarise(V2 = paste(c(V2, max(V2)), collapse=","))
# A tibble: 3 × 2
#     V1        V2
#   <chr>     <chr>
#1     A     4,6,6
#2     B     4,5,5
#3     C 4,4,3,4,4

data

df1 <- structure(list(V1 = c("A", "B", "C"), V2 = c("4,6", "4,5", "4,4,3,4"
)), .Names = c("V1", "V2"), class = "data.frame", row.names = c(NA, 
-3L))
akrun
  • 874,273
  • 37
  • 540
  • 662