1

So basically I have a dataframe that kinda looks like this:

Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24 
Akutan city   NA    NA         NA      NA  NA    NA    71
Alcan Border  NA    NA         2       NA  NA    NA    NA               
Alcan Border  NA    NA         NA      NA  NA    2     NA            
Alcan Border  NA    NA         NA      NA  5     NA    NA
Ambler City   224   NA         NA      NA  NA    NA    NA
Ambler City   NA    NA         NA      17  NA    NA    NA

Is there a simple way to combine multiple rows based on multiple column data? I've seen a few scripts that say you can combine one duplicate variable in a column based on one or two data columns but I need to do it more large scale (I have ~400 rows with duplicates and ~30 columns (and each column has a large name).

Ideally it would look like:

Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24 
Akutan city   NA    NA         NA      NA  NA    NA    71              
Alcan Border  NA    NA         2       NA  5     2     NA            
Ambler City   224   NA         NA      17  NA    NA    NA

I'm very new at R. Thank you!

Edit - I used the following code however a lot of column data (the data in rows after the first duplicate community name disappeared ex: the Alcon border values for 10-14 and 15-19 became NA) went missing when I collapsed it. Ideas?

library(dplyr)
census8 <- census7 %>%
  group_by(Community) %>%
  summarise_each(funs(sum))
Juliet R
  • 203
  • 2
  • 5
  • 13
  • 3
    Please make a [reproducible example](http://stackoverflow.com/q/5963269/903061), sharing some sample data in a copy/pasteable way. Since you tried to use `sum` it seems likely that your data is numeric (not `x`), so your example should reflect that. – Gregor Thomas May 09 '17 at 22:52
  • Sure! I will edit it now with the first few columns of my dataset. – Juliet R May 09 '17 at 22:58
  • @Gregor It has been edited with actual data – Juliet R May 09 '17 at 23:13
  • But not copy/pasteable data :( – Gregor Thomas May 09 '17 at 23:16
  • 1
    That said, now that we can see it, try adding `na.rm = TRUE` to your `summarise_each`. Should work. – Gregor Thomas May 09 '17 at 23:17
  • @Gregor thank you! It almost worked, but it turned all of my NA into 0s, which is a bit of a problem further down the datatable since 0 is a value for some of them (but thats ok!) – Juliet R May 09 '17 at 23:24
  • You can also try using the aggregate() function. Read the documentation ahead of time for it, as it should give you some options to work with NAs. – shu251 May 10 '17 at 04:47

1 Answers1

1

To keep the NAs in there the way you want you could use data.table:

library(data.table)
setDT(df)[,lapply(.SD, function(x) ifelse(all(is.na(x)), NA_integer_, sum(x, na.rm = T))), 
    by = Community]

#      Community Pop_Total Median_Age Under_5 5-9 10-14 15-19 20-24
#1:  Akutan_city        NA         NA      NA  NA    NA    NA    71
#2: Alcan_Border        NA         NA       2  NA     5     2    NA
#3:  Ambler_City       224         NA      NA  17    NA    NA    NA
Mike H.
  • 13,960
  • 2
  • 29
  • 39