1

I have dataset which is panel data that contains the following variables: 1. Country 2. Company 3. Monthly date 4. Revenue

 `A <- data.frame(Country=as.factor(rep('A', 138)), 
  Company = as.factor(c(rep('AAA', 12), rep('BBB', 8), rep('CCC', 72), rep('DDD', 46))),
  Date = c(seq(as.Date('2010-01-01'), as.Date('2011-01-01'), by = 'month'), 
  seq(as.Date('2010-01-01'), as.Date('2010-07-01'), by = 'month'),
  seq(as.Date('2010-01-01'), as.Date('2015-12-01'), by = 'month'),
  seq(as.Date('2012-03-01'), as.Date('2015-12-01'), by = 'month')), 
  Revenue= sample(10000:25000, 138)
)



B<- data.frame(Country=as.factor(rep('B', 108)), 
    Company = as.factor(c(rep('EEE', 36), rep('FFF', 36), rep('GGG', 36))),
    Date = c(seq(as.Date('2013-01-01'), as.Date('2015-12-01'), by = 'month'), 
             seq(as.Date('2013-01-01'), as.Date('2015-12-01'), by = 'month'),
             seq(as.Date('2013-01-01'), as.Date('2015-12-01'), by = 'month')),
    Revenue = sample(10000:25000, 108)
)`

I want to add other variable to the dataset - Competitor's revenue, which is the total sum of the revenues of all other companies in the own country for the corresponding month.

I wrote the following code:

    new_B<-data.frame()
for(i in 1:nlevels(B$Company)){ 
  temp_i<-B[which(B$Company==levels(B$Company)[i]),]
  temp_j<-B[which(B$Company!=levels(B$Company)[i]),]
  agg_temp<-aggregate(temp_j$Revenue, by = list(temp_j$Date), sum)
  temp_i$competitor_value<-ifelse(agg_temp$Group.1 %in% temp_i$Date, agg_temp$x, 0)
  new_B<-rbind(new_B, temp_i)
}

I created two temporary data set inside for loop one containing company i only and the other - all other companies. I summed all revenues of other companies by month. Then using ifelse for the same dates I add new variable to temp_i. It works nice for the companies that operated during the same period, but in country A there are companies that operated for different periods and when I try to use my code, I have error that they are not of the same length

    new_A<-data.frame()
  for(i in 1:nlevels(A$Company)){ 
    temp_i<-A[which(A$Company==levels(A$Company)[i]),]
    temp_j<-A[which(A$Company!=levels(A$Company)[i]),]
    agg_temp<-aggregate(temp_j$Revenue, by = list(temp_j$Date), sum)
    temp_i$competitor_value<-ifelse(agg_temp$Group.1 %in% temp_i$Date, agg_temp$x, 0)
    new_A<-rbind(new_A, temp_i)
  }

I found similar answer ifelse statements with dataframes of different lengths, but still do not know how to solve my problem.

I would really appreciate help

Community
  • 1
  • 1
Rfreak
  • 149
  • 4
  • 9

1 Answers1

3

I suggest a different approach using the dplyr package:

library(dplyr)
A %>% 
  bind_rows(B) %>%
  group_by(month=format(Date, "%Y-%m")) %>% 
  mutate(revComp = sum(Revenue)) %>%
  group_by(Company, add = T) %>%
  mutate(revComp = revComp-Revenue) 
# Source: local data frame [246 x 6]
# Groups: month, Company [246]
# 
#    Country Company       Date Revenue   month revComp
#      (chr)   (chr)     (date)   (int)   (chr)   (int)
# 1        A     AAA 2010-01-01   10657 2010-01   30356
# 2        A     AAA 2010-02-01   11620 2010-02   22765
# 3        A     AAA 2010-03-01   17285 2010-03   33329
# 4        A     AAA 2010-04-01   22886 2010-04   33469
# 5        A     AAA 2010-05-01   20129 2010-05   39974
# 6        A     AAA 2010-06-01   22865 2010-06   26896
# 7        A     AAA 2010-07-01   13087 2010-07   29542
# 8        A     AAA 2010-08-01   19451 2010-08   14842
# 9        A     AAA 2010-09-01   12364 2010-09   15309
# 10       A     AAA 2010-10-01   19375 2010-10   14090
lukeA
  • 53,097
  • 5
  • 97
  • 100