I think the solution to my problem is quite basic, but given that I am an R starter, I still find it difficult to correctly combine the different steps (in Excel, I could easily do all this in seconds). However, let's look at two simplified datasets:
#Set up df1
a <- c(1000, 1000, 1001, 1001)
b <- c(9000, 9001, 9010, 9011)
c <- c("Apple", "Apple", "BP", "BP")
df1 <- data.frame(a,b,c)
names(df1) <- c("FirmNo", "StockNo", "FirmName")
and
#Set up df2
one <- c(9000, 9000, 9001, 9001, 9010, 9010, 9011, 9011, 9011)
two <- c(5, 15, 100, 200, 20, 80, NA, 850, 800)
df2 <- data.frame(one,two)
names(df2) <- c("StockNo", "Volume")
Which result into
FirmNo StockNo FirmName
1 1000 9000 Apple
2 1000 9001 Apple
3 1001 9010 BP
4 1001 9011 BP
and
StockNo Volume
1 9000 5
2 9000 15
3 9001 100
4 9001 200
5 9010 20
6 9010 80
7 9011 NA
8 9011 850
9 9011 800
respectively. What I would like to achieve is the following: Enrich df1 with the average volume based on the volume figures in df2, so that the result looks like this:
FirmNo StockNo FirmName AvgVol
1 1000 9000 Apple 10
2 1000 9001 Apple 150
3 1001 9010 BP 50
4 1001 9011 BP 825
Please mind that the actual tables are much longer, but the structure is essentially the same. I could assume that a few lines - maybe even with dplyr - could easily do the job. I already found the following solution, but think that there is something much more elegant:
df2 <- df2 %>%
group_by(StockNo) %>%
mutate(AvgVol = mean(Volume, na.rm=TRUE)) #Add AvgVol column with respective data
df2 <- distinct(df2, StockNo, avgVol) #Reduce dataset to individual lines
df1 <- merge(df1, df2, by="StockNo") #Merge df1 and df2
Thanks a lot for your help in advance and kindly come back to me in case I did not explain myself well.