2

Consider the following dataframe:

df <- data.frame(numeric=c(1,2,3,4,5,6,7,8,9,10), string=c("a", "a", "b", "b", "c", "d", "d", "e", "d", "f"))
print(df)
numeric string
1        1      a
2        2      a
3        3      b
4        4      b
5        5      c
6        6      d
7        7      d
8        8      e
9        9      d
10      10      f

It has a numeric variable and a string variable. Now, I would like to create another dataframe in which the string variable displays only the list of unique values "a", "b", "c", "d", "e", "f", and the numeric variable is the result of the sum of the numeric valuesin the previous dataframe, resulting in this data frame:

print(new_df)
numeric string
1        3      a
2        7      b
3        5      c
4       22      d
5        8      e
6       10      f

This can be done using a for loop, but it would be rather inefficient in large datasets, and I would prefer other options. I have tried using dplyr package, but I did not get the expected result:

library(dplyr)
> df %>% group_by(string) %>% summarize(result = sum(numeric))
result
1     55
  • 1
    Try with `dplyr::summarise` May be your `summarise` got masked by `plyr::summarise` – akrun May 07 '19 at 17:59
  • 1
    `aggregate(df$numeric, list(df$string), sum)` – G5W May 07 '19 at 17:59
  • Your solution is correct, try @akrun 's suggestion. Otherwise use `tally(numeric)` instead of `summarise` (i.e. `df %>% group_by(string) %>% tally(numeric)`) – MrNetherlands May 07 '19 at 18:00

2 Answers2

4

It could be an issue of masking function from plyr (summarise/mutate functions are also there in plyr). We can explicitly specify the summarise from dplyr

library(dplyr)
df %>% 
    group_by(string) %>%
    dplyr::summarise(numeric = sum(numeric))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

You can do this without loading any extra packages using tapply or aggregate.

Greg Snow
  • 48,497
  • 6
  • 83
  • 110