2

I am trying to create a column which is the sum of the unique values of column SIZE

TYPE    SIZE
A   24522145.17
A   35359867.65
A   35359867.65
A   35359867.65
A   35359867.65
A   35359867.65
A   24522145.17
A   35359867.65
A   35359867.65
A   8527174.786

I don't want the count of unique values but the sum, that from the data above the sum will be adding (24522145.17 + 35359867.65 + 8527174.786). This after grouping by variable TYPE. Thanks!

DSan
  • 59
  • 6

4 Answers4

3

We group by 'TYPE', get the unique 'SIZE' and return with the sum of those values in summarise

library(dplyr)
df1 %>%
    group_by(TYPE) %>%
     summarise(Sum = sum(unique(SIZE), na.rm = TRUE))

-output

# A tibble: 1 x 2
  TYPE        Sum
  <chr>     <dbl>
1 A     68409188.

data

df1 <- structure(list(TYPE = c("A", "A", "A", "A", "A", "A", "A", "A", 
"A", "A"), SIZE = c(24522145.17, 35359867.65, 35359867.65, 35359867.65, 
35359867.65, 35359867.65, 24522145.17, 35359867.65, 35359867.65, 
8527174.786)), class = "data.frame", row.names = c(NA, -10L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thanks! It worked, but is it possible to create a column and add it to the existing dataset even though the number of rows would not match? I tried mutate with your code and it didn't work. – DSan Jun 07 '21 at 19:29
  • @DSan Just change itt o `mutate` i.e. `df1 <- df1 %>% group_by(TYPE) %>% mutate(Sum = sum(unique(SIZE), na.rm = TRUE))`. It wouldn't matter about number of rows, as `sum` is just a single number and it gets recycled to the length of each group – akrun Jun 07 '21 at 19:30
2

We could also get the distinct combos and add those using count where the wt are the values we want to sum.

library(dplyr)
df1 %>%
  distinct(TYPE, SIZE) %>%
  count(TYPE, wt = SIZE)  # shortcut for group_by(TYPE) %>% summarize(n = sum(SIZE))

data

df1 <- data.frame(
  stringsAsFactors = FALSE,
              TYPE = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A"),
              SIZE = c(24522145.17,35359867.65,
                       35359867.65,35359867.65,35359867.65,35359867.65,
                       24522145.17,35359867.65,35359867.65,8527174.786)
)
Jon Spring
  • 55,165
  • 4
  • 35
  • 53
1

With your properly formatted data

tibble::tribble(
  ~TYPE, ~SIZE,
  "A",   24522145.17,
  "A",   35359867.65,
  "A",   35359867.65,
  "A",   35359867.65,
  "A",   35359867.65,
  "A",   35359867.65,
  "A",   24522145.17,
  "A",   35359867.65,
  "A",   35359867.65,
  "A",   8527174.786
) -> df

you could do (h/t @akrun)

aggregate(SIZE ~ TYPE, unique(df), sum)

returning

  TYPE     SIZE
1    A 68409188
ktiu
  • 2,606
  • 6
  • 20
1

We could use filter and !duplicated():

df1 %>% 
  group_by(TYPE) %>% 
  filter(!duplicated(SIZE)) %>% 
  summarise(sum = sum(SIZE))

Output:

  TYPE        sum
  <chr>     <dbl>
1 A     68409188.
TarJae
  • 72,363
  • 6
  • 19
  • 66