0

I have the following dataframe:

No Name Fruit Country
1  Tom  Pear  France
2  Tom  Pear  France
3  Tom  Pear  Poland
4  Bob  Kiwi  Poland
5  Bob  Pear  France
6  Ann  Pear  France
7  Ann  Pear  Poland
8  Dod  Pear  Poland
7  Dod  Plum  Poland
8  Dod  Plum  Poland
9  Dod  Pear  Germany
10 Dod  Kiwi  Estonia

How can I, for each unique name, count how many unique values there are in the fruit column and then add this number as a column to the dataframe?

The output I would like to get is this:

 No Name Fruit Country Unique
1  Tom  Pear  France   1
2  Tom  Pear  France   1
3  Tom  Pear  Poland   1
4  Bob  Kiwi  Poland   2
5  Bob  Pear  France   2
6  Ann  Pear  France   1
7  Ann  Pear  Poland   1
8  Dod  Pear  Poland   3
7  Dod  Plum  Poland   3
8  Dod  Plum  Poland   3
9  Dod  Pear  Germany  3
10 Dod  Kiwi  Estonia  3

I tried

df %>%
group_by(Name, Fruit) %>%
mutate(unique = n()) %>%
ungroup()

But the above does not work the way I wanted it to.

RNewbie
  • 15
  • 6
  • Greetings! Usually it is helpful to provide a minimally reproducible dataset for questions here. The way you have shared your data is reproducible but takes awhile to work with. One way of doing this is by using the `dput` function. You can find out how to use it here: https://youtu.be/3EID3P1oisg. You can also simply create your own toy dataset that resembles this one so its easier for others to help you. – Shawn Hemelstrand Sep 26 '22 at 00:48
  • 2
    Use `n_distinct(Fruit)` instead of `n()` in your attempt above. – Ritchie Sacramento Sep 26 '22 at 01:02
  • 2
    @RitchieSacramento - **and** don't group by `Fruit` – thelatemail Sep 26 '22 at 01:04
  • @thelatemail - whoops, yes indeed. – Ritchie Sacramento Sep 26 '22 at 01:08
  • Yeah I missed that the first go around. I learned something today haha – Shawn Hemelstrand Sep 26 '22 at 01:10
  • 1
    Thank you everyone. @Shawn Hemelstrang I will use the function you suggested from now on so that my questions are easier to work with. I appreciate your help. – RNewbie Sep 26 '22 at 01:13
  • No prob. This is usually where people get referenced to go on providing a minimally reproducible dataset if you want more info on what constitutes as "good" for sharing: https://stackoverflow.com/help/minimal-reproducible-example – Shawn Hemelstrand Sep 26 '22 at 01:14

1 Answers1

2

I'm not sure if your intent was just grouping by name and fruit, but its better to use mutate here. You also wouldn't use ungroup after unless you were doing manipulations to the data after:

df %>% 
  group_by(Name,
           Fruit) %>% 
  mutate(Name.Fruit.Unique = n()) 

Which gives you this. You can see for example Tom and Pear show up three times, and it is listed so in the data frame:

      No Name  Fruit Country Name.Fruit.Unique
   <dbl> <chr> <chr> <chr>               <int>
 1     1 Tom   Pear  France                  3
 2     2 Tom   Pear  France                  3
 3     3 Tom   Pear  Poland                  3
 4     4 Bob   Kiwi  Poland                  1
 5     5 Bob   Pear  France                  1
 6     6 Ann   Pear  France                  2
 7     7 Ann   Pear  Poland                  2
 8     8 Dod   Pear  Poland                  2
 9     7 Dod   Plum  Poland                  2
10     8 Dod   Plum  Poland                  2
11     9 Dod   Pear  Germany                 2
12    10 Dod   Kiwi  Estonia                 1

Edit

It looks like I misinterpreted your question. For unique values grouped by name and then counting the unique number of fruits per person, this may be a better option:

df %>% 
  group_by(Name) %>% 
  mutate(Count = n_distinct(Fruit))

Which gives you this:

# A tibble: 12 × 5
# Groups:   Name [4]
      No Name  Fruit Country Count
   <dbl> <chr> <chr> <chr>   <int>
 1     1 Tom   Pear  France      1
 2     2 Tom   Pear  France      1
 3     3 Tom   Pear  Poland      1
 4     4 Bob   Kiwi  Poland      2
 5     5 Bob   Pear  France      2
 6     6 Ann   Pear  France      1
 7     7 Ann   Pear  Poland      1
 8     8 Dod   Pear  Poland      3
 9     7 Dod   Plum  Poland      3
10     8 Dod   Plum  Poland      3
11     9 Dod   Pear  Germany     3
12    10 Dod   Kiwi  Estonia     3
Shawn Hemelstrand
  • 2,676
  • 4
  • 17
  • 30