What is the best way to get counts across several columns?

Question

I have data that looks like this:

I want to be able to see how often each seller in chosen within each country. I have done it the long and slow way like this:

competitor_by_country <- df %>% 
  group_by(country) %>% 
  summarise(
    Test_count = sum(!is.na(Test)),
    Test2_count = sum(!is.na(Test2)),
    Shopify_count = sum(!is.na(Shopify_)),
    Aliexpress_count = sum(!is.na(Aliexpress)),
    JD_count = sum(!is.na(JD)),
    Flipkart_count = sum(!is.na(Flipkart_)),
    Rakuten_count = sum(!is.na(Rakuten_)),
    `John Lewis_count` = sum(!is.na(`John Lewis_`)),
    Otto_count = sum(!is.na(Otto_)),
    Noon_count = sum(!is.na(Noon_)),
    `Walmart (3rd Party)_count` = sum(!is.na(`Walmart (3rd Party)`)),
    `Amazon Vendor Central_count` = sum(!is.na(`Amazon Vendor Central_`)),
    `Walmart (Supplier_count` = sum(!is.na(`Walmart (Supplier`)),
    Zalando_count = sum(!is.na(Zalando_)),
    Tmall_count = sum(!is.na(Tmall)),
    
  )

But this was quite tedious, and I have other data with 50-100 columns. Can someone advise me on an approach to shorten this, such as a loop?

Here is the output of the current code:

Please share a reproducible example of your data: https://stackoverflow.com/help/minimal-reproducible-example — deschen, Dec 28 '21 at 19:25
with base R: `aggregate(. ~ country, data = df, sum, na.rm = TRUE)` — tjebo, Dec 28 '21 at 19:38
Please do not post an image of code/data/errors: it breaks screen-readers and it cannot be copied or searched (ref: https://meta.stackoverflow.com/a/285557 and https://xkcd.com/2116/). Please just include the code, console output, or data (e.g., `data.frame(...)` or the output from `dput(head(x))`) directly. — r2evans, Dec 28 '21 at 19:38
@tjebo that doesn't work if the columns to be summarized are `character` indicating the company names. — r2evans, Dec 28 '21 at 19:40
@tjebo Your suggested code does not work any less effectively on images of data ;-) — r2evans, Dec 28 '21 at 19:42
The question overall isn't bad. It just needs the data in an appropriate format, not a screenshot. — , Dec 28 '21 at 19:47
StephenPoole, the question is close to being closed (for a wrong reason, but still ...). I suggest you either add usable data and/or accept an answer (I suggest Mr.Rlover's answer is canonical, complete, and clear). — r2evans, Dec 28 '21 at 19:53
@r2evans Can you recommend an effective way to share tabular data on Stack? — Stephen Poole, Dec 28 '21 at 21:59
@StephenPoole, the "gold standard" (unambiguous, instantly usable) is `dput(x)` where `x` is the sample data. Another common method is the literal text of `read.table(text="...")`. Both are discussed here: https://stackoverflow.com/q/5963269/3358272. — r2evans, Dec 29 '21 at 00:26

Mr.Rlover · Accepted Answer · 2021-12-29T08:21:56.320

5

competitor_by_country <- df %>% 
  group_by(country) %>% 
  summarise(across(where(is.numeric), function(x) sum(!is.na(x))))

Or for none numeric columns, simply exclude the grouping column:

competitor_by_country <- df %>% 
  group_by(country) %>% 
  summarise(across(.cols = everything(), function(x) sum(!is.na(x))))

edited Dec 29 '21 at 08:21

answered Dec 28 '21 at 19:26

Mr.Rlover

2,523
3
14
32

This worked great! I had to change .cols = -country to .cols = c(Test:Tmall) because I was getting a "can't subset columns that don't exist" error for the variable country. I don't know the reason for the error, and apologies for not posting reproducible code, but can you think of a reason I'd get this error? Is it related to the use of the group_by perhaps? – Stephen Poole Dec 28 '21 at 22:08
I didn't test the code because of no data. I think the group_by caused your error. You can use .cols = everything() instead of Test:Tmall – Mr.Rlover Dec 29 '21 at 08:21

score 3 · Answer 2 · answered Dec 28 '21 at 19:29

3

library(tidyverse)
competitor_by_country <- df %>%
  pivot_longer(-country) %>%
  group_by(country, name) %>%
  summarize(count = sum(!is.na(value))) %>%
  ungroup() %>%
  pivot_wider(values_from = count)

answered Dec 28 '21 at 19:29

deschen

10,012
3
27
50

What is the best way to get counts across several columns?

2 Answers2