R: Combining identical rows into one (preferably using dplyr/tidyr)

Question

I have an issue I can't seem to figure out

I have a data frame like this

df <- data.frame(c(rep_len("a",3), "b", "b"), c(rep_len(55, 3), 44, 44),c(rep_len(12, 3), 6, 6), c("na", 2, "na", 3, "na"), c("na", "na", 4, "na", 8), c(5, "na", "na", "na", "na"))
names(df) <- c("street", "latitude", "longitude", "A", "B", "C")

street latitude longitude     A   B   C
    a       55         12    na  na   5
    a       55         12     2  na  na
    a       55         12    na   4  na
    b       44          6     3  na  na
    b       44          6    na   8  na

and I guess what I'm looking for is a way to collapse rows with identical values in 'street', 'latitude', 'longitude' so the data frame looks like this

street latitude longitude     A   B   C
     a       55        12     2   4   5
     b       44         6     3   8  na

My best attempt is this:

df %>%
  group_by(street) %>%
  summarise_each(funs(first))

But its not quite right. Have any ideas?

One way is to melt your wide table into a long one, remove NAs and cast it into a wide one. Look for *reshape2* package or similar. Something tells me you might consider redesigning the way you deal with data. — mlt, Sep 08 '15 at 19:43

score 5 · Answer 1 · answered Sep 08 '15 at 19:53

I don't understand why you have "na" strings - R has an NA for characters/factors. Anyway, for your example, perhaps you're looking for this:

library(data.table)
dt = as.data.table(df) # or convert in place using setDT

dt[, lapply(.SD, function(x) x[x != "na"]), by = .(street, latitude, longitude)]
#   street latitude longitude A B  C
#1:      a       55        12 2 4  5
#2:      b       44         6 3 8 NA

user295691 · Answer 2 · 2015-09-09T13:33:53.243

To expand on @mlt's comment, you can use tidyr (a successor to reshape2) to reshape this. It would look like

df %>%
  gather(type, value, -c(street, latitude, longitude)) %>%
  na.omit %>%
  spread(type, value)

This spreads out the A/B/C columns into rows, omits the NA fields, and then spreads them back out.

As noted by @eddi, you'll want to use the built-in NA value rather than the string "na". I used

dfs <- 'street latitude longitude     A   B   C
    a       55         12    NA  NA   5
    a       55         12     2  NA  NA
    a       55         12    NA   4  NA
    b       44          6     3  NA  NA
    b       44          6    NA   8  NA
'
df <- read.table(text=dfs, header=T)

ulfelder · Accepted Answer · 2015-09-08T20:37:21.843

1

This works without reshaping and using just dplyr, as long as you use the standard NA in place of your "na" and specify stringsAsFactors=FALSE when creating df:

df %>%
  group_by(street, latitude, longitude) %>%
  summarise_each(funs(ifelse(sum(is.na(.)==FALSE)==0, NA, .[which(is.na(.)==FALSE)])), matches("[A-Z]{1}"))

# Result
  street latitude longitude A B  C
1      a       55        12 2 4  5
2      b       44         6 3 8 NA

If you'd prefer to stick with "na", then this works:

df %>%
  group_by(street, latitude, longitude) %>%
  summarise_each(funs(ifelse(sum(.!="na")==0, "na", .[which(.!="na")])), matches("[A-Z]{1}"))

edited Sep 08 '15 at 20:37

answered Sep 08 '15 at 20:23

ulfelder

5,305
1
22
40

2

Another option would be something like `summarise_each(funs(first(.[!is.na(.)])))` or `summarise_each(funs(.[. != "na"][1]))` – aosmith Sep 08 '15 at 21:19
@aosmith That's also how I would do it. – Steven Beaupré Sep 09 '15 at 16:44

R: Combining identical rows into one (preferably using dplyr/tidyr)

3 Answers3