Summarize with character type conditions in dplyr

Question

I would like to count the number of times a country is listed alone and the times is listed with some other country.

This is a section of MY DATASET:

address_countries2
name_countries      n_countries
China               1                      
China               1
Usa                 1                        
Usa                 1
China France        2               
China France        2
India               1                      
India               1
Jordan Germany      2

I have used the following code to extract the number of times each country appears.

publication_countries <- address_countries2 %>% 
  select(name_countries, n_countries) %>% 
  unnest_tokens(word, name_countries) %>%
  group_by(word) %>% 
  summarise(TP = n())

 head(publication_countries)
 # A tibble: 6 x 2
    word          TP
    <chr>       <int>
   1 China         4
   2 Usa           2
   3 France        2
   4 India         2
   5 Jordan        1       
   6 Germany       1

I would like to create a new column with the number of rows a country is listed on its own, as well as a second column with the number of times a country is listed with other countries.

DESIRED OUTPUT Something like this:

 head(publication_countries)
 # A tibble: 6 x 2
    word          TP      single_times      with_other_countries
    <chr>       <int>            <int>                     <int>   
   1 China         4                2                         2
   2 Usa           2                2                         0
   3 France        2                0                         2
   4 India         2                2                         0
   5 Jordan        1                0                         1
   6 Germany       1                0                         1

From this link I have seen a possible way to summarise with condition, however, in my case I would need to use something different than sum(), as my conditional object is in form of character (column word).

summarise(TP = n() , IP = count(word[n_countries=="1"]))

But I get this error:

Error in summarise_impl(.data, dots) : 
  Evaluation error: no applicable method for 'groups' applied to an object of    class "character"

Please any help would be appreciated :)

Many thanks

score 2 · Accepted Answer · answered Feb 08 '18 at 19:39

2

dat%>% 
   select(name_countries, n_countries) %>% 
   unnest_tokens(word, name_countries) %>%
   group_by(word)%>%mutate(TP=n())%>%
   group_by(n_countries,word)%>%mutate(Tp1=n())%>%
   unique()%>%spread(n_countries,Tp1,0)
# A tibble: 6 x 4
# Groups:   word [6]
     word    TP   `1`   `2`
*   <chr> <int> <dbl> <dbl>
1   china     4     2     2
2  france     2     0     2
3 germany     1     0     1
4   india     2     2     0
5  jordan     1     0     1
6     usa     2     2     0

answered Feb 08 '18 at 19:39

Onyambu

67,392
3
24
53

there is only one little problem. In my full data sample the values of n_countries varies from 1 to 3, which gives me three columns by groping by n_countries. Is there a way to combine any column which is not unity? – Amleto Feb 08 '18 at 23:56
Am sorry I don't understand your question – Onyambu Feb 09 '18 at 00:09
sometimes there are more than two countries in a row in "name_countries", so for example when 3 countries n_countries = 3. This gives me three columns when using your code. But I would like only two columns, one for all single countries and one for any number of countries. – Amleto Feb 09 '18 at 00:21
is it not possible to `mutate` all the columns that you need to be combined together ie add them together to get one column? – Onyambu Feb 09 '18 at 00:27

Summarize with character type conditions in dplyr

1 Answers1