0

I have example data which looks as follows:

library(dplyr)
library(tidyr)

# example data frame
df <- data.frame(
  col1 = c("A;B;C", "A;B", "B;C", "A;C", "B", "A;B;C;D"),
  col2 = c("X;Y;Z", "X;Y", "Y;Z", "X;Z", "Z", "W;X;Y;Z"),
  col3 = c("1;2", "1", "2;3", "3", "4;5;6", "7"),
  col4 = c(1, 2, 3, 4, 5, 6),
  col5 = c(TRUE, FALSE, TRUE, FALSE, TRUE, FALSE)
)

# select columns to separate
selected_cols <- c("col1", "col2", "col3", "col4", "col5")

The following code does however not work for some reason:

# separate rows within selected columns that are character columns
df_separated <- df %>% 
  mutate(across(where(is.character), ~ separate_rows(., sep = ";")))

It gives the error:

Error in `mutate()`:
ℹ In argument: `across(where(is.character), ~separate_rows(., sep
  = ";"))`.
Caused by error in `across()`:
! Can't compute column `col1`.
Caused by error in `UseMethod()`:
! no applicable method for 'separate_rows' applied to an object of class "character"
Run `rlang::last_error()` to see where the error occurred.

I am kind of assuming that the entire point of separate_rows is to be applied to character columns, so something is going wrong..

Background

I wanted to make bar chart out of every column of my data set, for which I found this very nice solution by Ronak Shah.

library(ggplot2)

lapply(names(df), function(col) {
  ggplot(df, aes(.data[[col]], ..count..)) + 
    geom_bar(aes(fill = .data[[col]]), position = "dodge")
}) -> list_plots

Now my issue is that some of my columns have multiple answers, so the code does not work properly.

Tom
  • 2,173
  • 1
  • 17
  • 44
  • 1
    you can `separate_rows(df, col1:col2)`, but you can't do `separate_rows(df, col1:col3)` in this example, because `col3` doesn't have the same structure as `col1` and `col2`. – langtang Mar 27 '23 at 13:17

2 Answers2

1

You can pivot_longerfirst and only then separate_rows:

df %>% 
  pivot_longer(1:3) %>%
  separate_rows(value)
# A tibble: 38 × 4
    col4 col5  name  value
   <dbl> <lgl> <chr> <chr>
 1     1 TRUE  col1  A    
 2     1 TRUE  col1  B    
 3     1 TRUE  col1  C    
 4     1 TRUE  col2  X    
 5     1 TRUE  col2  Y    
 6     1 TRUE  col2  Z    
 7     1 TRUE  col3  1    
 8     1 TRUE  col3  2    
 9     2 FALSE col1  A    
10     2 FALSE col1  B    
# … with 28 more rows
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
1

First of all, separate_rows is not meant to be used in mutate. Second, separating multiple columns will only work if they contain the same number of elements per cell. As the latter is not the case for your columns one option would of course be to reshape to long as suggested by @ChrisRuehlemann.

However, as your final goal is to make a bar chart of each column another option would be to move the separate_rows step into your plotting function:

library(ggplot2)
library(tidyr)

lapply(c("col1", "col3"), function(col) {
  separate_rows(df, all_of(col), sep = ";") |>
    ggplot(aes(.data[[col]])) +
    geom_bar(aes(fill = .data[[col]]))
})
#> [[1]]

#> 
#> [[2]]

stefan
  • 90,330
  • 6
  • 25
  • 51