iterating table() results into matrix/data frame

Question

This must be simple but I'm banging my head against it for a while. Please help. I have a large data set from which I get all kinds of information via table(). I then want to store these counts, with the rownames that were counted. For a reproducible example consider

a <- c("a", "b", "c", "d", "a", "b")  # one count, occurring twice for a and 
                                      # b and once for c and d 
b <- c("a", "c")  # a completly different property from the dataset 
                  # occurring once for a and c
x <- table(a)
y <- table(b)  # so now x and y hold the information I seek

How can I merge/bind/whatever to get from x and y to this form:

   x. y.
a  2. 1
b  2. 0
c  1. 1
d. 1  0

HOWEVER, I need to use the solution to work iteratively, in a loop that takes x and y and gets the requested form above, and then gets more tables added, each hopefully adding a column. One of my many failed attempts, just to show my (probably flawed) logic, is:

member <- function (data = dfm, groupvar = 'group', analysis = kc15) {
  res<-matrix(NA,ncol=length(analysis$size)+1) #preparing an object for the results
  res[,1]<-table(docvars(data,groupvar)) #getting names and totals of groups
  for (i in 1:length(analysis$size)) { #getting a bunch of counts that I care about
    r<-table(docvars(data,groupvar)[analysis$cluster==i])
    res<-cbind(res,r) #here's the problem, trying to add each new count as a column.
  }
  res
}

So, to sum, the reproducible example above means to replicate the first column in res and an r, and I'm seeking (I think) a correct solution instead of the cbind, which would allow adding columns of different length but similar names, as in the example above. Please help its embarrassing how much time I'm wasting on this

Hi Shoulda, the example is not quite reproducible, because we don't have the `docvars` function nor the `dfm` data. Can you provide those? Possibly with `dput(head(dfm))`. See [How to make a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more info. — Ian Campbell, Mar 27 '20 at 13:17

Edward · Answer 1 · 2020-03-28T02:01:33.560

The following may be an option, which merges on the "row names" of the data frames, converted from the frequency tables:

df <- merge(as.data.frame(x, row.names=1, responseName ="x"), 
            as.data.frame(y, row.names=1, responseName ="y"), 
         by="row.names", all=TRUE)
df[is.na(df)] <- 0; df

  Row.names x y
1         a 2 1
2         b 2 0
3         c 1 1
4         d 1 0

Then, this method can be incorporated into your real data with some modification. I've made up the data since I didn't have any to work with.

set.seed(1234)
groupvar <- sample(letters[1:4], 16, TRUE)
clusters <- 1:4
cluster <- rep(clusters, each=4)

Merge the first two tables:

res <- merge(as.data.frame(table(groupvar[cluster==1]),
                           row.names=1, responseName=clusters[1]),
             as.data.frame(table(groupvar[cluster==2]),
                           row.names=1, responseName=clusters[2]),
             by="row.names", all=TRUE)

Then merge the others using your for loop.

for (i in 3:length(clusters)) { 
  r <- table(groupvar[cluster==i])
  res <- merge(res, as.data.frame(r, row.names=1, responseName = clusters[i]), 
               by.x="Row.names", by.y="row.names", all=TRUE)
}
res[is.na(res)] <- 0

res
  Row.names X1 X2 X3 X4
1         a  1  2  0  0
2         b  1  1  2  2
3         c  0  1  1  2
4         d  2  0  1  0

It's a bit shaky since I was coding blind ( no data ), but I think it can work with a bit of tweaking. — Edward, Mar 27 '20 at 14:25
thanks I'm sure it can, but I can't seem to tweak it myself. Your code gave "Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column Called from: fix.by(by.x, x)". I tried to tweak it and run each line in the console, and this:"res <- merge(res, as.data.frame(r, row.names=1, responseName = c("core",i), by.x="Row.names", by.y="row.names", all=TRUE))" gives "Warning message: In names(ex)[3L] <- responseName : number of items to replace is not a multiple of replacement length", and the object is all wrong, with no row names, length 90 instead of 10, etc. — ShouldaKnownThis, Mar 27 '20 at 17:23
Can you provide an example of your data as asked by the comment above? It's very difficult otherwise. Edit your question. Thanks. — Edward, Mar 28 '20 at 00:38

score 1 · Accepted Answer · answered Mar 27 '20 at 13:20

1

merge the transposed and re-transpose.

res <- t(merge(t(unclass(x)), t(unclass(y)), all=TRUE))
res <- `colnames<-`(res[order(rownames(res)), 2:1], c("x", "y"))
res[is.na(res)] <- 0
res
#   x y
# a 2 1
# b 2 0
# c 1 1
# d 1 0

answered Mar 27 '20 at 13:20

jay.sf

60,139
8
53
110

Thanks @jay.sf ! it works in the console for 2 tables, but trying to put this in my function gives Error in res[order(rownames(res)), 2:1] : subscript out of bounds. in the first iteration. I also tried to replace the 2 in 2:1 with the iterator i+1, same. Any ideas? – ShouldaKnownThis Mar 27 '20 at 18:00

iterating table() results into matrix/data frame

2 Answers2