0

I have a table where is user id and the duration of the trip in hours.

data.frame(
  check.names = FALSE,
      user_id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
   `trip.(h)` = c(0, 0.02, 170, 0.07, 12, 3, 90, 550, 1)
)
user_id trip (h)
1 0
1 0.02
1 170
1 0.07
1 12
1 3
1 90
1 550
1 1
... ...

Now I would like to add a column with a trip id for each person. Trip id should change for each person (unique user_id ) if any value in column trip is longer than 155 hours. I would do it with a loop, but the dataframe contains 12 columns and 5.7m rows so it would be highly ineffective. Is it possible to do it columnwise with dplyr? Since I have multiple user_id, I would firstly group by user id and then maybe do it with mutate()? Maybe with fill()?

Also thought of counting >155h trips for each person and then assigning 1:n() elements to each trip, but it also seems a little far fetched

This is the desired output:

data.frame(
  check.names = FALSE,
      User.id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
   `trip.(h)` = c(0, 0.02, 170, 0.07, 12, 3, 90, 550, 1),
      trip_id = c(1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 3L)
)
User id trip (h) trip_id
1 0 1
1 0.02 1
1 170 2
1 0.07 2
1 12 2
1 3 2
1 90 2
1 550 3
1 1 3
... ... ...
Eric
  • 2,699
  • 5
  • 17
  • No, just whenever the trip is over 155h it is a start of a new trip – Brigitta Rebane Mar 17 '21 at 13:32
  • Welcome to StackOverflow. When you can, please review the following on how to provide a proper [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Knowing how to do this will prove beneficial in your questions to come! – Eric Mar 17 '21 at 14:09
  • Yes, thank you, very good point you showed! I will keep this in mind for the next time – Brigitta Rebane Mar 17 '21 at 14:20

2 Answers2

3

Try cumsum like below

df  %>% 
  group_by(User_id)  %>% 
  mutate(trip_id = cumsum(trip>155)+1) %>% 
  ungroup()
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0

trip (h) will be read as trip..h. in R

your_data %>% 
  mutate(flag_new_trip = as.integer(trip..h. > 155)) %>% 
  group_by(user_id) %>% 
  mutate(trip_id = cumsum(flag_new_trip)+1)
Vinson Ciawandy
  • 996
  • 11
  • 26