1

I have two vectors, one with the (floating-point) labels, one with the values, e.g

x = c(100.5, 101, 100.5, 102, 99.9, 101, 100.5)
y = c(    3,   1,     1,   2,    0,   1,     0)

The result I'm looking for is the sum for each of the labels, i.e.

res = list("100.5" = 3+1, "101" = 1+1, "102" = 2)

(Ideally "99.9" is not there, as shown above; but if it is there with a count of zero that is also acceptable.)

None of the R idioms I know seem to work, so I tried a C++ style loop: use a for loop to iterate through y, grab the value from x, but then I get stuck on the "does the value already exist in res" part (to know whether to initialize a new element, or add on to the existing entry). And it just feels so wrong to be doing it this way in R!

By The Way

It needn't be a list; a named vector, or class table, are also fine. (If it was C++ I'd be using std::map<double,double>.) One of the things I need to do next is be able to merge them, and named vectors, at least, go wrong:

res1 = c(3,4,5);names(res1) = c("100.5","101","102")
res2 = c(2,4,6);names(res2) = c("99.5", "100.5", "102")
res3 = c(2,7,4,11);names(res3) = c("99.5", "100.5", "101", "102")
res1 + res2

res1 + res2 does not give me res3. Doing the same thing with list objects gives "non-numeric argument to binary operator". (https://stackoverflow.com/a/12897398/841830 shows how to sum table objects together; a similar approach might work for named vectors...)

Community
  • 1
  • 1
Darren Cook
  • 27,837
  • 13
  • 117
  • 217
  • 1
    `tapply(y, INDEX = as.factor(x), sum)` – Gregor Thomas Dec 28 '15 at 23:29
  • And, re your *By the Way*, if you want to be merging things, use data frames. Start with `res1 = data.frame(values = 3:5, labels = c("100.5","101","102"))`, etc., then use `merge` to merge them. `+` does addition, `merge` does merging. – Gregor Thomas Dec 28 '15 at 23:33

3 Answers3

3

Base R has a family of apply functions you can read lots about here. Reading through that (and wanting to stay in base R), tapply is what you're looking for, it applies a function based on a grouping and condenses the result.

x = c(100.5, 101, 100.5, 102, 99.9, 101, 100.5)
y = c(    3,   1,     1,   2,    0,   1,     0)
tapply(y, INDEX = as.factor(x), sum)
#  99.9 100.5   101   102 
#     0     4     2     2 

You can also use aggregate:

aggregate(y, by = list(x), FUN = sum)
#   Group.1 x
# 1    99.9 0
# 2   100.5 4
# 3   101.0 2
# 4   102.0 2

As for your other issues, I'd strongly recommend using data frames rather than trying to do too much with named vectors. There's a lot of infrastructure for working with data frames (in Base R, merge, aggregate and many others; also the data.table and dplyr packages).

As another note, using floats as labels is risky... I'd keep them as character or factor classes as much as possible to avoid bugs caused by with floating point precision.

Community
  • 1
  • 1
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
2

We could also use xtabs. By default, it gets the sum

xtabs(y~x)
#x
# 99.9 100.5   101   102 
#    0     4     2     2 
akrun
  • 874,273
  • 37
  • 540
  • 662
1

Maybe this:

x = c(100.5, 101, 100.5, 102, 99.9, 101, 100.5)
y = c(    3,   1,     1,   2,    0,   1,     0)

df <- data.frame(x1=as.character(x),x2=y,stringsAsFactors=F)

keys <- unique(df$x1)
vals <- sapply(keys,function (x) sum(df[x==df$x1,]$x2))
vals

yielding

100.5   101   102  99.9 
    4     2     2     0 
Mike Wise
  • 22,131
  • 8
  • 81
  • 104