3

How can I create a cross table in R (RStudio), where I count occurrences.

I have this sample input:

Technology <- c("A", "A", "B", "C", "C", "C")
Development <- c(1, 0, 1, 1, 1, 1)
Production <- c(1, 1, 0, 0, 0, 1)
Sales <- c(0, 0, 1, 1, 0, 1)
DF <- data.frame(Technology, Development, Production, Sales)

I want to know in which domain which technology is used most often.

The result should look like in the picture.

enter image description here

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
Mec-Eng
  • 199
  • 10

3 Answers3

3

These problems are many times a data format problem and the solution is to reshape from wide to long format first, see this question.


Here is a base R solution with reshape and cross tabulation with xtabs.

Technology <- c("A", "A", "B", "C", "C", "C")
Development <- c(1, 0, 1, 1, 1, 1)
Production <- c(1, 1, 0, 0, 0, 1)
Sales <- c(0, 0, 1, 1, 0, 1)
DF <- data.frame(Technology, Development, Production, Sales)

reshape(
  DF,
  direction = "long",
  varying = list(names(DF[-1])),
  v.names = "Active",
  times = names(DF[-1]),
  timevar = "Phase"
) |>
  (\(x) xtabs(Active ~ Phase + Technology, x))()
#>              Technology
#> Phase         A B C
#>   Development 1 1 3
#>   Production  2 0 1
#>   Sales       0 1 2

Created on 2022-04-18 by the reprex package (v2.0.1)


And a tidyverse solution.

suppressPackageStartupMessages({
  library(magrittr)
  library(tidyr)
})

DF %>%
  pivot_longer(-Technology) %>%
  xtabs(value ~ name + Technology, .)
#>              Technology
#> name          A B C
#>   Development 1 1 3
#>   Production  2 0 1
#>   Sales       0 1 2

Created on 2022-04-18 by the reprex package (v2.0.1)

Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
1

Here is a tidyverse approach, to get your desired output:

  1. We group by Technology to summarise with across
  2. then we prepare the rownames with paste and apply column_to_rownames from tibble
  3. finally we could transform with t()
library(dplyr)
library(tibble)
DF %>% 
  group_by(Technology) %>% 
  summarise(across(c(Development, Production, Sales), sum)) %>% 
  mutate(Technology = paste("Technology", Technology, sep = " ")) %>% 
  column_to_rownames("Technology") %>% 
  t()
            Technology A Technology B Technology C
Development            1            1            3
Production             2            0            1
Sales                  0            1            2
TarJae
  • 72,363
  • 6
  • 19
  • 66
  • 1
    That's a great solution. Reminds me of a SQL query :) This makes it intuitive for me as well. – Mec-Eng Apr 18 '22 at 17:11
0

Since you asked for a crosstable, you can also use the package crosstable for that:

library(crosstable)
crosstable(DF, by=Technology)%>% 
  as_flextable()

enter image description here

However, in your case, you don't care about proportions and you only need the numbers when each variable is 1, so you might want to run instead:

library(dplyr)
crosstable(DF, by=Technology, percent_pattern="{n}") %>% 
    filter(variable==1) %>% select(-variable) %>% 
    as_flextable()

enter image description here

More info about the package at https://danchaltiel.github.io/crosstable/.

Dan Chaltiel
  • 7,811
  • 5
  • 47
  • 92