-1

I have data that looks like this:

enter image description here

I want to be able to see how often each seller in chosen within each country. I have done it the long and slow way like this:

competitor_by_country <- df %>% 
  group_by(country) %>% 
  summarise(
    Test_count = sum(!is.na(Test)),
    Test2_count = sum(!is.na(Test2)),
    Shopify_count = sum(!is.na(Shopify_)),
    Aliexpress_count = sum(!is.na(Aliexpress)),
    JD_count = sum(!is.na(JD)),
    Flipkart_count = sum(!is.na(Flipkart_)),
    Rakuten_count = sum(!is.na(Rakuten_)),
    `John Lewis_count` = sum(!is.na(`John Lewis_`)),
    Otto_count = sum(!is.na(Otto_)),
    Noon_count = sum(!is.na(Noon_)),
    `Walmart (3rd Party)_count` = sum(!is.na(`Walmart (3rd Party)`)),
    `Amazon Vendor Central_count` = sum(!is.na(`Amazon Vendor Central_`)),
    `Walmart (Supplier_count` = sum(!is.na(`Walmart (Supplier`)),
    Zalando_count = sum(!is.na(Zalando_)),
    Tmall_count = sum(!is.na(Tmall)),
    
  )

But this was quite tedious, and I have other data with 50-100 columns. Can someone advise me on an approach to shorten this, such as a loop?

Here is the output of the current code:

enter image description here

r2evans
  • 141,215
  • 6
  • 77
  • 149
Stephen Poole
  • 371
  • 3
  • 9
  • 1
    Please share a reproducible example of your data: https://stackoverflow.com/help/minimal-reproducible-example – deschen Dec 28 '21 at 19:25
  • with base R: `aggregate(. ~ country, data = df, sum, na.rm = TRUE)` – tjebo Dec 28 '21 at 19:38
  • Please do not post an image of code/data/errors: it breaks screen-readers and it cannot be copied or searched (ref: https://meta.stackoverflow.com/a/285557 and https://xkcd.com/2116/). Please just include the code, console output, or data (e.g., `data.frame(...)` or the output from `dput(head(x))`) directly. – r2evans Dec 28 '21 at 19:38
  • 2
    @tjebo that doesn't work if the columns to be summarized are `character` indicating the company names. – r2evans Dec 28 '21 at 19:40
  • 1
    @r2evans fair enough. Poor question, poor answer :D – tjebo Dec 28 '21 at 19:40
  • 2
    @tjebo Your suggested code does not work any less effectively on images of data ;-) – r2evans Dec 28 '21 at 19:42
  • 1
    The question overall isn't bad. It just needs the data in an appropriate format, not a screenshot. –  Dec 28 '21 at 19:47
  • StephenPoole, the question is close to being closed (for a wrong reason, but still ...). I suggest you either add usable data and/or accept an answer (I suggest Mr.Rlover's answer is canonical, complete, and clear). – r2evans Dec 28 '21 at 19:53
  • @r2evans Can you recommend an effective way to share tabular data on Stack? – Stephen Poole Dec 28 '21 at 21:59
  • @StephenPoole, the "gold standard" (unambiguous, instantly usable) is `dput(x)` where `x` is the sample data. Another common method is the literal text of `read.table(text="...")`. Both are discussed here: https://stackoverflow.com/q/5963269/3358272. – r2evans Dec 29 '21 at 00:26

2 Answers2

5
competitor_by_country <- df %>% 
  group_by(country) %>% 
  summarise(across(where(is.numeric), function(x) sum(!is.na(x))))

Or for none numeric columns, simply exclude the grouping column:

competitor_by_country <- df %>% 
  group_by(country) %>% 
  summarise(across(.cols = everything(), function(x) sum(!is.na(x))))
Mr.Rlover
  • 2,523
  • 3
  • 14
  • 32
  • This worked great! I had to change .cols = -country to .cols = c(Test:Tmall) because I was getting a "can't subset columns that don't exist" error for the variable country. I don't know the reason for the error, and apologies for not posting reproducible code, but can you think of a reason I'd get this error? Is it related to the use of the group_by perhaps? – Stephen Poole Dec 28 '21 at 22:08
  • I didn't test the code because of no data. I think the group_by caused your error. You can use .cols = everything() instead of Test:Tmall – Mr.Rlover Dec 29 '21 at 08:21
3
library(tidyverse)
competitor_by_country <- df %>%
  pivot_longer(-country) %>%
  group_by(country, name) %>%
  summarize(count = sum(!is.na(value))) %>%
  ungroup() %>%
  pivot_wider(values_from = count)
deschen
  • 10,012
  • 3
  • 27
  • 50