1

I am using R and I have a dataframe that looks like this:

Van Route Price Tickets Sold Revenue
U67 12333 30.00 11 330.00
U67 12333 25.00 5 125.00
U67 12333 20.00 10 200.00
U69 65111 30.00 15 450.00
U69 65111 25.00 8 200.00
U69 65111 20.00 11 220.00

and the data frame is very very big... but basically it looks like that

I would like to have a new dataframe that looks like this:

Van Route Price Tickets Sold Revenue
U67 12333 30.00, 25.00, 20.00 26 655.00
U69 65111 30.00, 25.00, 20.00 34 870.00

Thanks in advance guys!!! :)

  • Hi @Mateo Guajardo! Please improve your post to help the people that will help you. Please include some data that could be pasted directly in R. Use `dput()` on some subset of your data (`dput(head(data))`) to do that. Take a look to this post: https://stackoverflow.com/help/minimal-reproducible-example – Jose Feb 04 '22 at 01:12
  • 1
    Part of this is covered by the duplicate links in an earlier question of yours (https://stackoverflow.com/q/70964517/5325862). The rest is covered [here](https://stackoverflow.com/q/15933958/5325862) – camille Feb 04 '22 at 01:45

2 Answers2

1

Lets assume that your first dataset is called df1

library(dplyr)
df2 <- df1 %>% group_by(Van, Route) %>%
 summarise(Price = paste(Price, collapse=", "),
           "Tickets Sold" = sum(`Tickets Sold`),
             Revenue = sum(Revenue))
df2
Dave2e
  • 22,192
  • 18
  • 42
  • 50
Bloxx
  • 1,495
  • 1
  • 9
  • 21
0

Some of the things you want to do is possible. Consolidating the Van, Route, Tickets Sold and Revenue columns is fairly straightforward to do in dplyr. Unfortunately, consolidating the price column the way you want to isn't possible. To the best of my knowledge, R dataframes cannot store lists or vectors in their cells. What you could do is store the Price as a string instead, but that could make using it later more difficult. You would need to provide more information on what you want to do with this dataframe for me to be able to tell.

Assuming you are okay with turning the prices into a string, then the code Bloxx provided will work.

It should be noted that dplyr is known for being slow, which might be a concern given the size of your data frame. I know people use data.table for large datasets because it is faster, but I am not so well-versed in its use so I cannot say anything about it. If dplyr takes too long to conduct its transformations for you, I would suggest taking a look at it.

user3124634
  • 151
  • 7