2

I am using R..

I have a dataframe about reservations in a transportation company:

Van Route Departure Price Customer ID
U21 LA - SF 8:00:00 30.00 467866578
U21 LA - SF 8:00:00 30.00 234656433
U21 LA - SF 8:00:00 30.00 654343554
U21 LA - SF 8:00:00 30.00 466534444
U21 LA - SF 8:00:00 30.00 354543433
U22 LA - SD 6:00:00 20.00 345464533
U22 LA - SD 6:00:00 20.00 345456777
U22 LA - SD 6:00:00 20.00 344565411
U22 LA - SD 6:00:00 20.00 119873566

I would like to make a new data frame that shows this:

Van Route Departure Price Tickets Sold Revenue
U21 LA - SF 8:00:00 30.00 5 150.00
U22 LA - SD 6:00:00 20.00 4 80.00

Thanks in advance!!!!! Please help :)

3 Answers3

2

or using dplyr

library(dplyr)

df_new<-df %>%
  group_by(Van, Route, Departure, Price)  %>%
  summarize(`Tickets Sold`=n(),
            Revenue=sum(Price)) %>%
  ungroup()

df_new
#> # A tibble: 2 × 6
#>   Van   Route   Departure Price `Tickets Sold` Revenue
#>   <chr> <chr>   <chr>     <int>          <int>   <int>
#> 1 U21   LA - SF 8:00:00      30              5     150
#> 2 U22   LA - SD 6:00:00      20              4      80

Created on 2022-02-02 by the reprex package (v2.0.1)

Joe Erinjeri
  • 1,200
  • 1
  • 7
  • 15
0
# load package
library(data.table)

# set dataframe as datatable
setDT(df)

# calculate
df[, .(tickets_sold = .N
       , revenue = sum(Price)
       )
   , .(Van, Route, Departure, Price)
   ]
Sweepy Dodo
  • 1,761
  • 9
  • 15
0

You can dplyr::group_by() the common variables and then use dplyr::summarize() to calculate the number of entries per group with n() and the total Revenue with sum().

library(tidyverse)

d <- structure(list(Van = c("U21", "U21", "U21", "U21", "U21", "U22", "U22", "U22", "U22"), Route = c("LA - SF", "LA - SF", "LA - SF", "LA - SF", "LA - SF", "LA - SD", "LA - SD", "LA - SD", "LA - SD"), Departure = c("8:00:00", "8:00:00", "8:00:00", "8:00:00", "8:00:00", "6:00:00", "6:00:00", "6:00:00", "6:00:00"), Price = c(30, 30, 30, 30, 30, 20, 20, 20, 20), Customer.ID = c(467866578L, 234656433L, 654343554L, 466534444L, 354543433L, 345464533L, 345456777L, 344565411L, 119873566L)), class = "data.frame", row.names = c(NA, -9L))

d %>% 
  group_by(across(Van:Departure)) %>% 
  summarize(Tickets_Sold = n(), Revenue = sum(Price), .groups = "drop")
#> # A tibble: 2 x 5
#>   Van   Route   Departure Tickets_Sold Revenue
#>   <chr> <chr>   <chr>            <int>   <dbl>
#> 1 U21   LA - SF 8:00:00              5     150
#> 2 U22   LA - SD 6:00:00              4      80

Created on 2022-02-02 by the reprex package (v2.0.1)

Dan Adams
  • 4,971
  • 9
  • 28