0

I need to aggregate rows of a data frame with MANY rows in this manner:

Lets say I have two rows that I want to merge (as they have the same first element):

  x1=c(ID1,1,3,2,1,NA);
  x2=c(ID1,2,2,3,NA,NA);

I want to merge their values so, that the result will in the case of clash (clash=both rows have a non-NA value in the element) prefer the value in a row with higher value in 2nd element (that is x2).

So in the example the result of the aggregation of these particular two rows will be:

x12=c(ID1,2,2,3,1,NA)

Can anyone please help? I tried aggregate(), but it offers just summary statistics like mean, max etc.

rafa.pereira
  • 13,251
  • 6
  • 71
  • 109
marek_sebo
  • 31
  • 4
  • can there be more than 1 ID1 at one dataframe or is it unique? can you give a sample of the data (the 2 dataframes)? it is not clear if you want a function to get x1 and x2 as input and output x12 or a solution for both dataframes – Zahiro Mor May 19 '16 at 12:56
  • Please provide sample data that represents what you want to do. You say you want to "merge" two data.frames by rows, but have only provided two vectors. See the following for tips on producing a minimum example(http://stackoverflow.com/help/mcve) as well as this one for creating [examples in R](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). – lmo May 19 '16 at 12:56

2 Answers2

1

You can use max as a function for the aggregate function.

Data:

x1=c('ID1',1,2,1,NA)
x2=c('ID1',3,3,NA,NA)

data <- data.frame(rbind(x1, x2), stringsAsFactors = FALSE)

Aggregating:

aggregate(data[,2:5], by = list(name = data$X1), FUN = max, na.rm = TRUE)

Result:

  name X2 X3 X4   X5
1  ID1  3  3  1 <NA>
elevendollar
  • 1,115
  • 10
  • 19
  • I would like to merge following the rule "if there is a non-NA value in the x2 vector, then keep that value. otherwise keep the value from x1" . that is with the inputs: 'x1=c('ID1',3,2,1,NA) x2=c('ID1',2,3,NA,NA)' the output will be 'x12=c(ID1,2,3,1,NA)' (my real dataset is of course much bigger than that). – marek_sebo May 30 '16 at 08:22
  • Please see the comment of @lmo first and please edit your original question to reflect what you are looking for. – elevendollar May 31 '16 at 11:08
0

You can use SQL via the package sqldf.

Using sql you can group by ID and select the max of the values you mentioned.

require(data.table)
require(sqldf)
x1 <- c(1,1,2,1,NA)
x2 <- c(1,3,3,NA,NA)
x  <- data.table(rbind(x1,x2))
colnames(x) <- c("ID", "x1", "x2", "x3", "x4")

sqldf("select ID, max(x1) as x1, max(x2) as x2, max(x3) as x3, max(x4) as x4 from x group by ID")

  ID x1 x2 x3 x4
1  1  3  3  1 NA
Hack-R
  • 22,422
  • 14
  • 75
  • 131