0

I have a data.frame of entries. I also have a data.frame for each week that have associated counts for the entries. No such week count data.frame has every entry in it though, so the original list is a superset for each of them.

What I'd like to do is combine these so that I have a data.frame where the first column is the entry and the next N columns are the N week counts where if an entry does not have a count for that week, then it is considered 0.

My first attempt looked like this:

append_week_counts_to_entries <- function(entries) {
  entries$week1 <- apply(entries,1,helpfunc,row=row,week=count_week1)
  entries$week2 <- apply(entries,1,helpfunc,row=row,week=count_week2)
# ... to all N weeks
  return(entries)
}

helpfunc <- function(entries,row,week) {
  if(as.character(row[1]) %in% week$id) {
    return(week[which(as.character(week$id) == as.character(row[1])),2])
  }
  else {
    return(0)
  }
}

(This worked until I abstracted it to how it looks now. I'd rather learn how it could work than keep the poor way of writing it I had before)

Besides not working as is, I also have a feeling that this is very inefficient for R. Help on both fronts would be much appreciated.

Edit: An example dataset would be:

entries: structure(list(`entries$id` = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10
)), .Names = "entries$id", row.names = c(NA, -10L), class = "data.frame")

count_week_i: structure(list(Var1 = structure(1:3, .Label = c("1", "2", "3"
), class = "factor"), Freq = c(1L, 2L, 4L)), .Names = c("Var1", 
"Freq"), row.names = c(NA, -3L), class = "data.frame")
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
user592419
  • 5,103
  • 9
  • 42
  • 67
  • 2
    post some data to get some answers. – Ramnath Nov 11 '11 at 04:40
  • Where do `row` and `count_week1` come from? Try showing us the structure of each data frame using `dput`. – joran Nov 11 '11 at 04:45
  • @joran row is meant to be the row from entries. count_week1 is already declared in the ws as the df with two columns for that week (one column has the id's, the other the corresponding counts). Ramnath, I am unsure what you would be looking for in terms of data. They are fairly basic sets, one with just id's for all entries (named entries) and many with id's and counts (each week) – user592419 Nov 11 '11 at 05:24
  • @user592419 The exact structure of your data frames is very important for deciding how to solve data manipulation problems like this, and the easiest way for you to communicate that to us is by showing some examples using `dput`. – joran Nov 11 '11 at 05:49
  • @joran, I am unsure what you want from the dput. The result of just running dput(entries) is quite large. Can you let me know? Thanks. – user592419 Nov 11 '11 at 05:58
  • `dput(head(...))` would be helpful, or you even better you could construct a small example data set that illustrates your problem, as is generally [recommended](http://stackoverflow.com/q/5963269/324364). – joran Nov 11 '11 at 06:06

1 Answers1

0

Indeed, the advanced use of lapplyand family is somewhat complex. Had to ask similar question one time or two times...

HTH: Using lapply with changing arguments

and

Running lagged regressions with lapply and two arguments

Particularly liked expand.grid

Community
  • 1
  • 1
Matt Bannert
  • 27,631
  • 38
  • 141
  • 207
  • Thanks. Expand.grid doesnt seem to be what I was looking for, but I can see how useful it could be. I actually couldnt get this to work just right and instead just wrote a quick job out of R and then read it in. – user592419 Nov 12 '11 at 00:59