1

For a sample dataframe:

df1 <- structure(list(place = c("a", "a", "b", "b", "b", "b", "c", "c", 
"c", "d", "d"), animal = c("cat", "bear", "cat", "bear", "pig", 
"goat", "cat", "bear", "goat", "goat", "bear"), number = c(5, 
6, 7, 4, 5, 6, 8, 5, 3, 7, 4)), .Names = c("place", "animal", 
"number"), row.names = c(NA, -11L), spec = structure(list(cols = structure(list(
    place = structure(list(), class = c("collector_character", 
    "collector")), animal = structure(list(), class = c("collector_character", 
    "collector")), number = structure(list(), class = c("collector_integer", 
    "collector"))), .Names = c("place", "animal", "number")), 
    default = structure(list(), class = c("collector_guess", 
    "collector"))), .Names = c("cols", "default"), class = "col_spec"), class = c("tbl_df", 
"tbl", "data.frame"))

I want to create a variable 'sum' which sums the 'number' column by 'place' (regardless of animal), and adds it to the datafame.

The command below:

df1$sum <- aggregate(df1$number, by=list(Category=df1$place), FUN=sum)

... tries to do the sum but can't complete the function because it wants to report by only the number of individual places (hence why we get this error):

Error in `$<-.data.frame`(`*tmp*`, sum, value = list(Category = c("a",  : 
  replacement has 4 rows, data has 11

Any ideas how I add this extra column onto my dataframe?

KT_1
  • 8,194
  • 15
  • 56
  • 68

1 Answers1

2

Since you have a tibble, first a dplyr solution. Next a base R version.

using dplyr:

df1 %>% 
  group_by(place) %>% 
  mutate(sum_num = sum(number))

# A tibble: 11 x 4
# Groups:   place [4]
   place animal number sum_num
   <chr> <chr>   <dbl>   <dbl>
 1 a     cat         5      11
 2 a     bear        6      11
 3 b     cat         7      22
 4 b     bear        4      22
 5 b     pig         5      22
 6 b     goat        6      22
 7 c     cat         8      16
 8 c     bear        5      16
 9 c     goat        3      16
10 d     goat        7      11
11 d     bear        4      11

using base R:

df1$sum_num <- ave(df1$number, df1$place, FUN = sum)

# A tibble: 11 x 4
   place animal number sum_num
   <chr> <chr>   <dbl>   <dbl>
 1 a     cat         5      11
 2 a     bear        6      11
 3 b     cat         7      22
 4 b     bear        4      22
 5 b     pig         5      22
 6 b     goat        6      22
 7 c     cat         8      16
 8 c     bear        5      16
 9 c     goat        3      16
10 d     goat        7      11
11 d     bear        4      11
phiver
  • 23,048
  • 14
  • 44
  • 56