3

I have a main data frame (data) that contains information about purchases: names, year, city, and a few other variables:

Name Year City
N1   2018 NY
N2   2019 SF
N2   2018 SF
N1   2010 NY
N3   2020 AA

I used new_data <- data %>% group by(Name) %>% tally(name = "Count") to get something like this:

Name Count
N1   2
N2   2
N3   1

My questions, preferably using dplyr:

1) How do I now add the city that corresponds to Name to new_data, i.e:

Name Count City
N1   2     NY
N2   2     SF
N3   1     AA

2) How do I add the earliest year of each Name to new_data, i.e.:

Name Count City Year
N1   2     NY   2010
N2   2     SF   2018
N3   1     AA   2020
questionmark
  • 335
  • 1
  • 13

2 Answers2

2

It seems that summarise may suit you better, for example:

data %>%
  group_by(Name, City) %>%
  summarise(Count = n(),
            Year = min(Year))

Output:

# A tibble: 3 x 4
# Groups:   Name [3]
  Name  City  Count  Year
  <fct> <fct> <int> <int>
1 N1    NY        2  2010
2 N2    SF        2  2018
3 N3    AA        1  2020

While you can group with City as well to keep it in the output.

arg0naut91
  • 14,574
  • 2
  • 17
  • 38
0

An option with data.table

library(data.table)
setDT(data)[, .(Count = .N, Year = min(Year)), .(Name, City)]
akrun
  • 874,273
  • 37
  • 540
  • 662