How do you collapse multiple rows based on multiple columns in r?

Question

So basically I have a dataframe that kinda looks like this:

Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24 
Akutan city   NA    NA         NA      NA  NA    NA    71
Alcan Border  NA    NA         2       NA  NA    NA    NA               
Alcan Border  NA    NA         NA      NA  NA    2     NA            
Alcan Border  NA    NA         NA      NA  5     NA    NA
Ambler City   224   NA         NA      NA  NA    NA    NA
Ambler City   NA    NA         NA      17  NA    NA    NA

Is there a simple way to combine multiple rows based on multiple column data? I've seen a few scripts that say you can combine one duplicate variable in a column based on one or two data columns but I need to do it more large scale (I have ~400 rows with duplicates and ~30 columns (and each column has a large name).

Ideally it would look like:

Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24 
Akutan city   NA    NA         NA      NA  NA    NA    71              
Alcan Border  NA    NA         2       NA  5     2     NA            
Ambler City   224   NA         NA      17  NA    NA    NA

I'm very new at R. Thank you!

Edit - I used the following code however a lot of column data (the data in rows after the first duplicate community name disappeared ex: the Alcon border values for 10-14 and 15-19 became NA) went missing when I collapsed it. Ideas?

library(dplyr)
census8 <- census7 %>%
  group_by(Community) %>%
  summarise_each(funs(sum))

Please make a [reproducible example](http://stackoverflow.com/q/5963269/903061), sharing some sample data in a copy/pasteable way. Since you tried to use `sum` it seems likely that your data is numeric (not `x`), so your example should reflect that. — Gregor Thomas, May 09 '17 at 22:52
Sure! I will edit it now with the first few columns of my dataset. — Juliet R, May 09 '17 at 22:58
That said, now that we can see it, try adding `na.rm = TRUE` to your `summarise_each`. Should work. — Gregor Thomas, May 09 '17 at 23:17
@Gregor thank you! It almost worked, but it turned all of my NA into 0s, which is a bit of a problem further down the datatable since 0 is a value for some of them (but thats ok!) — Juliet R, May 09 '17 at 23:24
You can also try using the aggregate() function. Read the documentation ahead of time for it, as it should give you some options to work with NAs. — shu251, May 10 '17 at 04:47

score 1 · Accepted Answer · answered May 09 '17 at 23:35

To keep the NAs in there the way you want you could use data.table:

library(data.table)
setDT(df)[,lapply(.SD, function(x) ifelse(all(is.na(x)), NA_integer_, sum(x, na.rm = T))), 
    by = Community]

#      Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24
#1:  Akutan_city        NA         NA      NA  NA    NA    NA    71
#2: Alcan_Border        NA         NA       2  NA     5     2    NA
#3:  Ambler_City       224         NA      NA  17    NA    NA    NA

How do you collapse multiple rows based on multiple columns in r?

1 Answers1