1

I am relatively new to the concepts of vectorization, and would like to ask whether or not the community has any suggestions for improving the run time of a process I have been using to download bloomberg API data and bind it to a matrix.

Currently, this process iterates through each individual date within my API call which takes quite a bite of time. I am wondering if I can do this in a "vectorized" way in order to make numerous calls at once, and then bind to a data frame, reducing run time. '''

#create fund names to feed through as param in loop below
fundList <- c("fund 1 on bloomberg",
"fund 2 on bloomberg",
"fund 3 on bloomberg",
"fund 4 on bloomberg",
"fund 5 on bloomberg",
"fund 6 on bloomberg",
"fund 7 on bloomberg",
)

#create datelist for params for loop
newDateList <- seq(as.Date(today()-1401),length=1401, by="days")
newDateListReformatted <- gsub("-","",newDateList)


#create df object and loop through bloomberg API, assign to dataframe object
df_total = data.frame()

for(fund in 1:length(fundList)){
  
  df_total = data.frame()
  
  for(b in 1:length(newDateListReformatted)){
    ovrd <- c("CUST_TRR_START_DT"=newDateListReformatted[b],"CUST_TRR_END_DT"=newDateListReformatted[b+1])
    print(ovrd)
    model <- bdp(fundList[fund],"CUST_TRR_RETURN_HOLDING_PER",overrides=ovrd)
    print(model)
    df <- data.frame(model)
    df1 <- data.frame(newDateListReformatted[b+1])
    df2 <- cbind(df,df1)
    df_total <- rbind(df_total,df2)
  }
  
  assign(fundList[fund],df_total)

}

'''

First the loop moves to a fund at the first level, iterates through all the dates, and binds the rows to the dataframe one step at a time before moving to the next fund in fundList and iterating through the timeseries again.

The way I am thinking about it, I would call a vector of multiple date parameters to the function, and then "vertically" assign them to the df_total matrix in a greater number than one at a time with each loop increasing run time. Alternatively, I could call each individual date, but do it across a number of funds and assign them "horizontally" to the matrix.

Any thoughts are appreciated.

  • not everybody has access to the bloomberg API, try to create a [reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example to increase your chances. The solution to this type of problems usually involve the `apply` [family of functions](https://ademos.people.uic.edu/Chapter4.html). – Trusky Aug 20 '20 at 17:39

1 Answers1

2

Vectorization consists of making a functions that efficiently implement handling of multiple parameters for each input. For example one can calculate the mean of columns using a loop lapply(mtcars, mean) or use the vectorized function colMeans(mtcars). The latter is much more efficient than using a loop, as the function is optimized over the inputs.

On stackoverflow vectorization is often misunderstood as readability of code, and as such using an *apply function is often considered vectorization, while these are more useful for readability they do not (by themselves) speed up your code.

For your specific example, your bottleneck (and problem) comes in part from a call to bdp and in part from iteratively expanding your result using cbind, rbind and assign.
To speed up your code, we first need to be aware of how the function is implemented. From the documentation we can read that fields and securities accept multiple arguments. These arguments are thus vectorized, while overrides only accepts a named vector of override fields. This means we can eliminate the outer loop in your code, by providing all the fields and securities in one go.

Next in order to reduce overhead from multiple calls to by iteratively expanding your data.frame, we can store the intermediate results in a list and combine everything in one go once the code has run. Combining these we get a code example such as the one below

n <- length(newDateListReformatted)
# Create override matrix (makes it easier to subset, but not strictly necessary
periods <- matrix(c(newDateListReformatted[-n], newDateListReformatted[-1]), ncol = 2, byrow = FALSE)
colnames(periods) <- c('CUST_TRR_START_DT', 'CUST_TRR_END_DT')
ovrds <- newDateListReformatted
models <- vector('list', n - 1)
for(i in seq_len(n - 1)){
  models[[i]] <- bdp(fundList, 
               'CUST_TRR_RETURN_HOLDING_PER', 
               overrides = periods[i, ]
               )
  # Add identifier columns
  models[[i]][,'CUST_TRR_START_DT'] <- periods[i, 1]
  models[[i]][,'CUST_TRR_END_DT'] <- periods[i, 2]
}
# Combine results in single data.frame (if wanted)
model <- do.call(rbind, models)

Note that the code finishes by combining the intermediary results using do.call(rbind, models) which gives a single data.frame, but one could use bind_rows from the dplyr package or rbindlist from the data.table package as well.

Further note that I do not have access to bloomberg (currently) and cannot test my code for possible spelling mistakes.

Oliver
  • 8,169
  • 3
  • 15
  • 37