1

This must be simple but I'm banging my head against it for a while. Please help. I have a large data set from which I get all kinds of information via table(). I then want to store these counts, with the rownames that were counted. For a reproducible example consider

a <- c("a", "b", "c", "d", "a", "b")  # one count, occurring twice for a and 
                                      # b and once for c and d 
b <- c("a", "c")  # a completly different property from the dataset 
                  # occurring once for a and c
x <- table(a)
y <- table(b)  # so now x and y hold the information I seek

How can I merge/bind/whatever to get from x and y to this form:

   x. y.
a  2. 1
b  2. 0
c  1. 1
d. 1  0

HOWEVER, I need to use the solution to work iteratively, in a loop that takes x and y and gets the requested form above, and then gets more tables added, each hopefully adding a column. One of my many failed attempts, just to show my (probably flawed) logic, is:

member <- function (data = dfm, groupvar = 'group', analysis = kc15) {
  res<-matrix(NA,ncol=length(analysis$size)+1) #preparing an object for the results
  res[,1]<-table(docvars(data,groupvar)) #getting names and totals of groups
  for (i in 1:length(analysis$size)) { #getting a bunch of counts that I care about
    r<-table(docvars(data,groupvar)[analysis$cluster==i])
    res<-cbind(res,r) #here's the problem, trying to add each new count as a column.
  }
  res
}

So, to sum, the reproducible example above means to replicate the first column in res and an r, and I'm seeking (I think) a correct solution instead of the cbind, which would allow adding columns of different length but similar names, as in the example above. Please help its embarrassing how much time I'm wasting on this

jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Hi Shoulda, the example is not quite reproducible, because we don't have the `docvars` function nor the `dfm` data. Can you provide those? Possibly with `dput(head(dfm))`. See [How to make a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more info. – Ian Campbell Mar 27 '20 at 13:17

2 Answers2

2

The following may be an option, which merges on the "row names" of the data frames, converted from the frequency tables:

df <- merge(as.data.frame(x, row.names=1, responseName ="x"), 
            as.data.frame(y, row.names=1, responseName ="y"), 
         by="row.names", all=TRUE)
df[is.na(df)] <- 0; df

  Row.names x y
1         a 2 1
2         b 2 0
3         c 1 1
4         d 1 0

Then, this method can be incorporated into your real data with some modification. I've made up the data since I didn't have any to work with.

set.seed(1234)
groupvar <- sample(letters[1:4], 16, TRUE)
clusters <- 1:4
cluster <- rep(clusters, each=4)

Merge the first two tables:

res <- merge(as.data.frame(table(groupvar[cluster==1]),
                           row.names=1, responseName=clusters[1]),
             as.data.frame(table(groupvar[cluster==2]),
                           row.names=1, responseName=clusters[2]),
             by="row.names", all=TRUE)

Then merge the others using your for loop.

for (i in 3:length(clusters)) { 
  r <- table(groupvar[cluster==i])
  res <- merge(res, as.data.frame(r, row.names=1, responseName = clusters[i]), 
               by.x="Row.names", by.y="row.names", all=TRUE)
}
res[is.na(res)] <- 0

res
  Row.names X1 X2 X3 X4
1         a  1  2  0  0
2         b  1  1  2  2
3         c  0  1  1  2
4         d  2  0  1  0
Edward
  • 10,360
  • 2
  • 11
  • 26
  • It's a bit shaky since I was coding blind ( no data ), but I think it can work with a bit of tweaking. – Edward Mar 27 '20 at 14:25
  • thanks I'm sure it can, but I can't seem to tweak it myself. Your code gave "Error in fix.by(by.x, x) : 'by' must specify a uniquely valid column Called from: fix.by(by.x, x)". I tried to tweak it and run each line in the console, and this:"res <- merge(res, as.data.frame(r, row.names=1, responseName = c("core",i), by.x="Row.names", by.y="row.names", all=TRUE))" gives "Warning message: In names(ex)[3L] <- responseName : number of items to replace is not a multiple of replacement length", and the object is all wrong, with no row names, length 90 instead of 10, etc. – ShouldaKnownThis Mar 27 '20 at 17:23
  • Can you provide an example of your data as asked by the comment above? It's very difficult otherwise. Edit your question. Thanks. – Edward Mar 28 '20 at 00:38
1

merge the transposed and re-transpose.

res <- t(merge(t(unclass(x)), t(unclass(y)), all=TRUE))
res <- `colnames<-`(res[order(rownames(res)), 2:1], c("x", "y"))
res[is.na(res)] <- 0
res
#   x y
# a 2 1
# b 2 0
# c 1 1
# d 1 0
jay.sf
  • 60,139
  • 8
  • 53
  • 110
  • Thanks @jay.sf ! it works in the console for 2 tables, but trying to put this in my function gives Error in res[order(rownames(res)), 2:1] : subscript out of bounds. in the first iteration. I also tried to replace the 2 in 2:1 with the iterator i+1, same. Any ideas? – ShouldaKnownThis Mar 27 '20 at 18:00