1

I have a data frame with many variables, two of them are "year" and "country". I would like to sum rows of some columns under a condition that rows are for a country in a particular year and NA's for those that are not specified. For example:

A B C  year  country total
1 1 1  2000   IT      3
2 2 2  2001   IT      6
3 3 3  2001   DE      9
4 4 4  2002   DK      NA
5 5 5  2000   FR      NA
6 6 6  2001   DE      18

In Stata this would look like:

egen variable = rowtotal (A B C) if ///
country_year=="36_04" | country_year=="37_04" | country_year=="96_04" | ///
country_year=="97_04" | country_year=="83_04" | country_year=="83_09" | ///
country_year=="87_09" | country_year=="87_04"
Laura
  • 306
  • 3
  • 12
  • 1
    You can combine conditions with `&` for "and" or `|` for "or" in the [`ifelse` answer to your last question](http://stackoverflow.com/q/42797960/903061). So instead of `Country %in% c("IT", "DE")` you could have `Country %in% c("IT", "DE") & year %in% c(2000, 2001)` to only sum if the (country is IT or DE) and (years is 2000 or 2001). – Gregor Thomas Mar 15 '17 at 00:25
  • Thank you. However, I think the script from Stata is more precise as I need country 36 in 2004 but not in 2009. Your solution implies that R will sum rows for the country in both years. – Laura Mar 15 '17 at 00:59
  • 2
    Well then you just need to write those conditions: `(Country == "IT" & year == 2004) | (Country == "DE" & year == 2001) | ...` or whatever the values you want are. – Gregor Thomas Mar 15 '17 at 01:44

2 Answers2

1

Using dplyr, with conditions to reproduce the output in the question and assuming data frame is named df1, with no existing column named total:

library(dplyr)
df1 %>%
  filter(year < 2002, country %in% c("IT", "DE")) %>%
  group_by(year, country) %>%
  rowwise() %>%
  mutate(total = sum(A, B, C)) %>%
  right_join(df1)

Result:

     A     B     C  year country total
 <int> <int> <int> <int>   <chr> <int>
     1     1     1  2000      IT     3
     2     2     2  2001      IT     6
     3     3     3  2001      DE     9
     4     4     4  2002      DK    NA
     5     5     5  2000      FR    NA
     6     6     6  2001      DE    18
neilfws
  • 32,751
  • 5
  • 50
  • 63
1

Here is an option using data.table by specifying the 'i' with logical condition, we sum (+) the corresponding elements of columns specified in the .SDcols and assign (:=) the output to 'total'

library(data.table)
setDT(df1)[year < 2002 & country %chin% c("IT", "DE"),
        total := Reduce(`+`, .SD),  .SDcols = A:C]
df1
#   A B C year country total
#1: 1 1 1 2000      IT     3
#2: 2 2 2 2001      IT     6
#3: 3 3 3 2001      DE     9
#4: 4 4 4 2002      DK    NA
#5: 5 5 5 2000      FR    NA
#6: 6 6 6 2001      DE    18
akrun
  • 874,273
  • 37
  • 540
  • 662
  • The example data and question aren't very clear, but from the comments on the question I think OP wants to only sum specific Country/Year combinations rather than separate country and year conditions stuck together with an `&` or `|`. – Gregor Thomas Mar 15 '17 at 01:56
  • 1
    @Gregor I was also confused by the question, but then I looked at the other answer and it used `group_by`. Anyway, I removed the group by as it doesn't have any impact on the final output – akrun Mar 15 '17 at 02:00
  • Thank you. yes, I meant exactly what @Gregor said. – Laura Mar 15 '17 at 10:10