1

I've some R-code which does, what I want it to do. But now the question: Is there any mechanism to avoid coding A1 A2 A3 and so on? I would like to code A* for all columns beginning with A. There can be any number of "A" columns in dependency to a list length which is definied in the code. The rest of the code is dynamic, but here I have a manual intervention (add some A columns or delete some A columns within the summerise statement).

I have found summarize_at, but I don't see how I can do the other things like last() and sum() at the same time for the other columns.

  l_af <- l_cf %>%
    group_by(PID, Server) %>%
    summarise(Player=last(Player),
              Guild=last(Guild),
              Points=last(Points),
              Battles=last(Battles),
              A1=max(A1),
              A2=max(A2),
              A3=max(A3),
              A4=max(A4),
              A5=max(A5),
              A6=max(A6),
              RecCount=sum(RecCount))

Any help is appreciated.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
rama1065
  • 29
  • 3
  • Welcome to StackOverflow! Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). This will make it much easier for others to help you. – Ronak Shah Sep 22 '19 at 14:10

1 Answers1

1

The problem with using summarise it is removes all other columns if they are not used. You can consider to use mutate first perform all the operations and then use summarise.

library(dplyr)

l_cf %>%
  group_by(PID, Server) %>%
  mutate_at(vars(Player,Guild,Points,Battles), last) %>%
  mutate_at(vars(starts_with("A")), max) %>%
  mutate(RecCount  = sum(RecCount)) %>%
  summarise_all(max)

A reproducible example

set.seed(123)
df <- data.frame(group = rep(1:5, 2), x = runif(10), y = runif(10), 
                 a1 = runif(10), a2 = runif(10), z = runif(10))

First applying functions individually for each column

df %>%
  group_by(group) %>%
  summarise(x=last(x),
            y=last(y),
            a1=max(a1),
            a2=max(a2),
            z=sum(z))

# A tibble: 5 x 6
#  group      x      y    a1    a2     z
#  <int>  <dbl>  <dbl> <dbl> <dbl> <dbl>
#1     1 0.0456 0.900  0.890 0.963 0.282
#2     2 0.528  0.246  0.693 0.902 0.648
#3     3 0.892  0.0421 0.641 0.691 0.880
#4     4 0.551  0.328  0.994 0.795 0.635
#5     5 0.457  0.955  0.656 0.232 1.01 

Now apply the functions together for multiple columns

df %>%
  group_by(group) %>%
  mutate_at(vars(x, y), last) %>%
  mutate_at(vars(starts_with("a")), max) %>%
  mutate(z = sum(z)) %>%
  summarise_all(max)


#  group      x      y    a1    a2     z
#  <int>  <dbl>  <dbl> <dbl> <dbl> <dbl>
#1     1 0.0456 0.900  0.890 0.963 0.282
#2     2 0.528  0.246  0.693 0.902 0.648
#3     3 0.892  0.0421 0.641 0.691 0.880
#4     4 0.551  0.328  0.994 0.795 0.635
#5     5 0.457  0.955  0.656 0.232 1.01 

We can see that both the approaches gave the same output.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you very much! That solves my problem. Btw. in my example "Guild" contains TRUE or FALSE values After the mutate I got 1 or 0. I had to recode them, but It works. – rama1065 Sep 23 '19 at 03:22