2

I have a data frame that stores the amount someone spends per transaction for this month. I'm trying to create a loop that checks for repeated user IDs, then sums and stores the amount they spent in total in the first record that they appear. It should set the amount they spent in any other occurrences to 0.

I keep getting "Error: No loop for break/next, jumping to top level" when I stop it from running:

# Number of trips 
numTrips <- NROW(tripData)

# For each trip in data
for (i in 1:numTrips){
  
  # For each trip after i
  for (j in ((i+1): numTrips)){
    
    # If the user ID's match, sum prices 
    if (tripData[i,]$user_id == tripData[j,]$user_id){
      tripData[i,]$original_price <- tripData[i,]$original_price + tripData[j,]$original_price 
        tripData[j,]$original_price <- 0
    }
      
  }
}

Can someone please help?

trysav
  • 23
  • 4
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. How many rows are there? They way you write this it scales quadratically so it will take a very long time if you have a large number of rows. It looks like this would be much easier with a `dplyr` `group_by` and `summarize` – MrFlick Jun 29 '20 at 23:03

1 Answers1

2

I'll go with @MrFlick's comment and give you a sample:

set.seed(42)
dat <- tibble(
  id = rep(1:3, each=3),
  when = sort(Sys.Date() - sample(10, size=9)),
  amt = sample(1e4, size=9))
dat
# # A tibble: 9 x 3
#      id when         amt
#   <int> <date>     <int>
# 1     1 2020-06-19   356
# 2     1 2020-06-20  7700
# 3     1 2020-06-21  3954
# 4     2 2020-06-22  9091
# 5     2 2020-06-23  5403
# 6     2 2020-06-24   932
# 7     3 2020-06-25  9189
# 8     3 2020-06-27  5637
# 9     3 2020-06-28  4002

It sounds like you want to sum the amounts for each id, but preserve the individual rows with the rest of the amounts zeroed out.

dat %>%
  group_by(id) %>%
  mutate(amt2 = c(sum(amt), rep(0, n() - 1)))
# # A tibble: 9 x 4
# # Groups:   id [3]
#      id when         amt  amt2
#   <int> <date>     <int> <dbl>
# 1     1 2020-06-19   356 12010
# 2     1 2020-06-20  7700     0
# 3     1 2020-06-21  3954     0
# 4     2 2020-06-22  9091 15426
# 5     2 2020-06-23  5403     0
# 6     2 2020-06-24   932     0
# 7     3 2020-06-25  9189 18828
# 8     3 2020-06-27  5637     0
# 9     3 2020-06-28  4002     0

If instead you just want the summaries, you can use this:

dat %>%
  group_by(id) %>%
  summarize(amt = sum(amt))
# # A tibble: 3 x 2
#      id   amt
#   <int> <int>
# 1     1 12010
# 2     2 15426
# 3     3 18828

or if you want to preserve the date range, then

dat %>%
  group_by(id) %>%
  summarize(whenfrom = min(when), whento = max(when), amt = sum(amt))
# # A tibble: 3 x 4
#      id whenfrom   whento       amt
#   <int> <date>     <date>     <int>
# 1     1 2020-06-19 2020-06-21 12010
# 2     2 2020-06-22 2020-06-24 15426
# 3     3 2020-06-25 2020-06-28 18828
r2evans
  • 141,215
  • 6
  • 77
  • 149