I would like to ask for your help guys.
I have some problems sometimes trying to improve a function I developed, but I ignore what steps I can improve with functions like apply or merge.
The idea is that I have two tables, one with 3 categorical data, "category", "month", "day", and one numeric "quantity", and another one with just "month" and "day". This is because in the first table I may not have data for all days or months, and I want a row for every category, month and day (if no data, quantity would be 0).
It is also necessary to create another column with the month and day together with "MMdd" format.
After hours trying to do the right way, I decided to use the "wrong" way, and of course, R froze. This is my code:
filldays<- function(calendar, data) {
categories <- levels(as.factor(data$category))
result <- data.frame()
for (category in categories) {
for(j in 1:nrow(calendar)) {
month <- calendar$month[j]
day <- calendar$days[j]
##Create the data for date (MMdd) variable
if (month < 10) {
m <- paste("0", month, sep="")
}
else m <- as.character(month)
if (day < 10) {
d <- paste("0", day, sep="")
}
else d <- as.character(day)
date <- paste(m,d, sep="")
##Search the value within data data.frame
quantity <- data[data$month == month & data$day == day & data$category == category,4]
if (length(quantity) == 0) {
quantity <- 0
}
## store result in new data.frame
line <- data.frame(as.character(category), as.numeric(month), as.numeric(day), as.character(date), as.numeric(quantity))
result <- rbind(result, line)
}
}
colnames(result) <- c("category", "month", "day", "date", "quantity")
result
}
What I am trying to achieve is something like this.
Table with data
category month day quantity
1 1 1 20
1 1 3 40
2 1 1 10
2 1 2 15
calendar table
month day
1 1
.
.
1 31
.
.
12 31
Table Objective:
category month day date quantity
1 1 1 0101 20
1 1 2 0102 0 (because there is no data this day)
1 1 3 0103 40
1 1 4 0104 0 (no data (till one year in months and days)
.
.
.
2 1 1 0101 10
.
.
.
I can't provide the real data because of confidentiality. Sorry. I hope this is enough to understand my problem
I know it's a mess, but I cannot came up with anything better. I don't have too much experience yet with optimize code in R.
Any help will be gladly appreciated, coz right now R is hanging when it tries to perform this (tables are not so big, I have 555 categories * 365 days of a year).