-1

I would like to ask for your help guys.

I have some problems sometimes trying to improve a function I developed, but I ignore what steps I can improve with functions like apply or merge.

The idea is that I have two tables, one with 3 categorical data, "category", "month", "day", and one numeric "quantity", and another one with just "month" and "day". This is because in the first table I may not have data for all days or months, and I want a row for every category, month and day (if no data, quantity would be 0).

It is also necessary to create another column with the month and day together with "MMdd" format.

After hours trying to do the right way, I decided to use the "wrong" way, and of course, R froze. This is my code:

filldays<- function(calendar, data) {
  categories <- levels(as.factor(data$category))
  result <- data.frame()
  for (category in categories) {

    for(j in 1:nrow(calendar)) {
      month <- calendar$month[j]
      day <- calendar$days[j]

      ##Create the data for date (MMdd) variable

      if (month < 10) {
        m <- paste("0", month, sep="")
      }
      else m <- as.character(month)
      if (day < 10) {
        d <- paste("0", day, sep="")
      }
      else d <- as.character(day)
      date <- paste(m,d, sep="")          

      ##Search the value within data data.frame

      quantity <- data[data$month == month & data$day == day & data$category == category,4]
      if (length(quantity) == 0) {
        quantity <- 0
      }

      ## store result in new data.frame

      line <- data.frame(as.character(category), as.numeric(month), as.numeric(day), as.character(date), as.numeric(quantity))
      result <- rbind(result, line)      
    }
  }
  colnames(result) <- c("category", "month", "day", "date", "quantity")
  result
}

What I am trying to achieve is something like this.

Table with data
category     month     day     quantity
1            1         1       20
1            1         3       40
2            1         1       10
2            1         2       15    

calendar table
month    day
1        1
.
.
1        31
.
.
12       31

Table Objective:

category     month     day     date     quantity
1            1         1       0101     20
1            1         2       0102     0 (because there is no data this day)
1            1         3       0103     40
1            1         4       0104     0 (no data (till one year in months and days)
.
.
.
2            1         1       0101     10
.
.
.

I can't provide the real data because of confidentiality. Sorry. I hope this is enough to understand my problem

I know it's a mess, but I cannot came up with anything better. I don't have too much experience yet with optimize code in R.

Any help will be gladly appreciated, coz right now R is hanging when it tries to perform this (tables are not so big, I have 555 categories * 365 days of a year).

JusefPol
  • 23
  • 4
  • 1
    Ummm. I think you need something pretty basic, well implemented in R (so whit no need to build a function like that). Would you mind to provide a sample data and how the expected output should looks like? Try to follow the [best practices](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) when asking a question. – Paulo E. Cardoso Jun 03 '14 at 16:31
  • 5
    I think you just want `merge`. But as mentioned above, it's really hard to tell what you're trying to do by reading that code. Try showing some small example data, and then show how it should look at the end. – joran Jun 03 '14 at 16:33
  • Yikes, you define a variable `c` but then use the base `c()` function – which is it a variable or a function? (Hint: it is bad to give a variable the same name as a function) – Ellis Valentiner Jun 03 '14 at 16:35
  • @user12202013 Why is it bad to give a variable the same name as a function? R has no difficulty telling them apart. – Señor O Jun 03 '14 at 17:42
  • @SeñorO R may be able to tell them apart, but undoubtedly there will be times when someone reading your code scratches his/her head and wonders what is going on. Objects should be given meaningful names and should behave consistently. If I say `c <- 1` and then type `c`, R will return 1 BUT does not also tell me that `c()` is a function. There is little difference between assigning `c` to be a constant, a vector, or an entirely new function! It is bad because it is bad style, not because R doesn't know the difference (in this case). – Ellis Valentiner Jun 03 '14 at 18:15
  • @user12202013 ok.. but typing `c()` tells you it is a function. "Style" is subjective. If you have a variable that it makes sense to name c, name it c. Nothing bad will come of it. – Señor O Jun 03 '14 at 19:00
  • @SeñorO You are right, style is subjective - but I think many people would agree that a clearer approach would be to avoid assigning objects that already have a meaning. I'm not saying never to re-use an object, but code is less clear when objects change type and purpose. Typing `c()` will execute the function with empty arguments, not print the function, you would have to use `get("c", baseenv())` – Ellis Valentiner Jun 03 '14 at 19:18
  • I change the name of the c variable to make it more readable. I also added an example. Merge by itself doesn't solve the problem. – JusefPol Jun 04 '14 at 07:53

2 Answers2

0

Pretty sure this is a duplicate. There are a ton of worked examples on merge and this seems like a case I've seen before. Wou suggest searching on: [r] merge all.x is.na

res <- merge(table2, table1, by= c("month" "day"), all.x=TRUE)
res$quantity[ is.na(res$quantity) ] <- 0

You avoid increasing the probability of adverse action to your SO standing if you delete your own question before it is closed as a "dupe."

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thanks for your answer, but merge doesn't solve my problem just yet, the calendar table has only 365 rows (one per day on a year) but the data table has a lot more rows (if I have 550 categories, and, for example I have 30 days of data for category, the table is much much bigger than the calendar table. merge doesn't create the table I am looking for (because by coincidence it has every day on every month, but in different categories). I added an example in the hope of clarify my problem. I also looked around for merge examples but I coulnd't find an answer to my problem. – JusefPol Jun 04 '14 at 07:48
0

Answering BondedDust I realize what was the problem, thanks.

I just had to create a table with calendar repeated for every category.

category <- sapply(as.character(levels(as.factor(data$category))), function (x) rep(x,nrow(calendar))
category <- as.vector(category)
category <- cbind(category, calendar) ## I get a warning about row names deleted, but everything works fine

After that the same merge BondedDust suggest, and then sapply for creating the "date" column

m <- sapply(res$month, function(x) {if (x < 10) paste("0", x,sep="") else as.character(x)}) 
d <- sapply(res$day, function(x) {if (x < 10) paste("0", x,sep="") else as.character(x)}) 
date <- paste(m,d,sep="")

Thanks everyone for your help, it is really hard sometimes coming from Java or C to think on a better use of code in R.

JusefPol
  • 23
  • 4