1

I am trying to create a summary table that tells me a Bikes usagewithin a Borough. The formula for which is

(No. of times a Bike is rented in particular Borough) / (Total No of rentals in that Borough).

Final output should look something like this.

BikeId   Borough       Pct
    1     K&C          0.02
    1     Hammersmith  0.45
    7     K&C          0.32 

To achieve that I am trying to implement a function as below:

smplData <- function(df) {
#initialize an empty dataframe
summDf <- data.frame(BikeId = character(), Borough = character(), Pct = 
               double())  

#create a vector of unique borough names
boro <- unique(df[,"Start.Borough"])
 for (i in 1:length(boro)){
     #looping through each borough and create a freq table
     bkCntBor<- table(df[df$Start.Borough==boro[i],"Bike.Id"])
     #total number of rentals in a particular borough
     borCnt <- nrow(df[df$Start.Borough==boro[i],])
    for (j in 1:length(bkCntBor)){
        #looping thru each bike for the ith borough and calculate ratio of jth bike
        bkPct <- as.vector(bkCntBor[j])/borCnt
        #temp dataframe to store a single row corresponding to bike, boro and ratio
        dfTmp <- data.frame(BikeId = names(bkCntBor[j]), Borough = boro[i], 
        Pct = bkPct)
       #append to summary table
       summDf <<- rbind(summDf, dfTmp)
  }

 }
}

The head of the df dataset is as below

>head(df)
Bike.Id Start.Borough Rental.Id
      1           K&C  61349872
      1           K&C  61361611
      1   Royal Parks  61362295
      1           K&C  61364627
      1           K&C  61367817
      1           H&F  61368333

When I run the function after inserting one record in summDf I get the below error

Error in data.frame(BikeId = names(bkCntBor[j]), Borough = boro[i], Pct = bkPct) : arguments imply differing number of rows: 0, 1

I can the run the function code in the console by passing one value at a time for i and j. But when I run it as a function I get the error mentioned above. Any help you guys can provide will be amazing.
Here is some sample data for the same.

Bike.Id    Start.Borough
1            K&C      
1            K&C    
1            K&C    
7            K&C  
7            K&C  
1            Hammersmith
1            Hammersmith 
7            Hammersmith 
9            Hammersmith
9            Westminster               
rkadam
  • 45
  • 2
  • 9
  • You should share sample data in a [reproducible format](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Also, you seem to be missing a brace in your function code so it's not syntactically valid. And finally, adding one row at a time to a data.frame is a terribly inefficient procedure. It would be better to start out with a clear description of what you are trying to do rather than requiring us to interpret the intentions of this non-functioning code. – MrFlick Mar 22 '17 at 16:43
  • @MrFlick apologize for that. I have updated the question with relevant intention, sample data and fixed the code with the missing brace. – rkadam Mar 22 '17 at 17:01

1 Answers1

0

Here's an option using dplyr

library(dplyr)
dd %>% 
  group_by(Start.Borough, Bike.Id) %>% 
  summarize(n=n()) %>%
  mutate(pct = n / sum(n)) %>%
  select(-n)

First we use group_by() find the counts of borough/bike combinations. Then we mutate those records to divide each borough/bike count with the sum of all the bikes in the borough.

  Start.Borough Bike.Id  prop
         <fctr>   <int> <dbl>
1   Hammersmith       1  0.50
2   Hammersmith       7  0.25
3   Hammersmith       9  0.25
4           K&C       1  0.60
5           K&C       7  0.40
6   Westminster       9  1.00

with the sample input

dd <- data.frame(Bike.Id = c(1, 1, 1, 7, 7, 1, 1, 7, 9, 9), 
    Start.Borough = c("K&C", "K&C", "K&C", "K&C", "K&C", "Hammersmith", 
    "Hammersmith", "Hammersmith", "Hammersmith", "Westminster"))
MrFlick
  • 195,160
  • 17
  • 277
  • 295