Create new columns based on part of an observation in another column in r

Question

I have a df called "encuesta" with this column

Orientación prioritaria
<chr>
Gastroenterología clinica;Endoscopia digestiva;Motilidad y Neurogastro
Gastroenterología clinica;Endoscopia digestiva;Motilidad y Neurogastro
Gastroenterología clinica;Endoscopia digestiva;Motilidad y Neurogastro
Gastroenterología clinica;Endoscopia digestiva
Motilidad y Neurogastro
Gastroenterología clinica;Motilidad y Neurogastro

And I want to create new columns based on the values separated by ";" in each row

I have tried something like this for example:

encuesta$`Gastroenterología clínica` <- encuesta$`Orientación prioritaria` %in% str_detect(encuesta$`Orientación prioritaria`, regex("Gastroenterología"))

It creates a new column Gastroenterología clínica but it evaluates every observation to FALSE and I don't understand why.

akrun · Answer 1 · 2020-09-12T22:56:03.260

1

Here is one option with mtabulate

library(qdapTools)
m1 <- cbind(encuesta, 
  mtabulate(strsplit(encuesta$`Orientación prioritaria`, ";")))
colSums(m1, na.rm = TRUE)

Or another option is cSplit_e

library(splitstackshape)
cSplit_e(encuesta, "Orientación prioritaria", sep=";", type = "character", fill = 0)

edited Sep 12 '20 at 22:56

answered Sep 11 '20 at 22:07

akrun

874,273
37
540
662

1

the `mtabulate` was a very useful approach. Together with the combination with `summarise` for totals from @Duck's approach. – Ale Rey Sep 12 '20 at 17:12
1

@AleRay If you want the total, just use `colSums`. updated the post. – akrun Sep 12 '20 at 22:56

Duck · Accepted Answer · 2020-09-12T16:53:19.173

Try this approach with separate() from tidyr (tidyverse):

library(tidyverse)
#Separate
df2 <- df %>% separate(col = V1,into = c('a','b','c'),sep = ';')

Output:

                          a                       b                       c
1 Gastroenterología clinica    Endoscopia digestiva Motilidad y Neurogastro
2 Gastroenterología clinica    Endoscopia digestiva Motilidad y Neurogastro
3 Gastroenterología clinica    Endoscopia digestiva Motilidad y Neurogastro
4 Gastroenterología clinica    Endoscopia digestiva                    <NA>
5   Motilidad y Neurogastro                    <NA>                    <NA>
6 Gastroenterología clinica Motilidad y Neurogastro                    <NA>

Some data used:

#Data
df <- structure(list(V1 = c("Gastroenterología clinica;Endoscopia digestiva;Motilidad y Neurogastro", 
"Gastroenterología clinica;Endoscopia digestiva;Motilidad y Neurogastro", 
"Gastroenterología clinica;Endoscopia digestiva;Motilidad y Neurogastro", 
"Gastroenterología clinica;Endoscopia digestiva", "Motilidad y Neurogastro", 
"Gastroenterología clinica;Motilidad y Neurogastro")), class = "data.frame", row.names = c(NA, 
-6L))

Update: In order to have one variable per value here the code:

#Code
df %>% separate_rows(V1,sep=';') %>%
  mutate(V=paste0('V',1:n())) %>%
  pivot_wider(names_from = V,values_from=V1)

Output:

# A tibble: 1 x 14
  V1      V2     V3      V4      V5    V6     V7     V8    V9     V10    V11   V12    V13    V14   
  <chr>   <chr>  <chr>   <chr>   <chr> <chr>  <chr>  <chr> <chr>  <chr>  <chr> <chr>  <chr>  <chr> 
1 Gastro~ Endos~ Motili~ Gastro~ Endo~ Motil~ Gastr~ Endo~ Motil~ Gastr~ Endo~ Motil~ Gastr~ Motil~

Update 2: In order to have a variable for each class, try this:

#Code 2
df %>% mutate(id=1:n()) %>% separate_rows(V1,sep=';') %>%
  #group_by(V1) %>%
  mutate(var=1) %>%
  pivot_wider(names_from = V1,values_from=var) %>%
  replace(is.na(.),0) %>% select(-id)

Output:

# A tibble: 6 x 3
  `Gastroenterología clinica` `Endoscopia digestiva` `Motilidad y Neurogastro`
                        <dbl>                  <dbl>                     <dbl>
1                           1                      1                         1
2                           1                      1                         1
3                           1                      1                         1
4                           1                      1                         0
5                           0                      0                         1
6                           1                      0                         1

And if you want the totals, try this:

#Code 3
df %>% mutate(id=1:n()) %>% separate_rows(V1,sep=';') %>%
  #group_by(V1) %>%
  mutate(var=1) %>%
  pivot_wider(names_from = V1,values_from=var) %>% select(-id) %>%
  summarise_all(.funs = sum,na.rm=T)

Output:

# A tibble: 1 x 3
  `Gastroenterología clinica` `Endoscopia digestiva` `Motilidad y Neurogastro`
                        <dbl>                  <dbl>                     <dbl>
1                           5                      4                         5

Sorry I can't understand, what do you mean with you question? — Duck, Sep 12 '20 at 15:34
Is there any way to make the values coincide within a colum? ej col a = 'Gastroenterología clínica`, col b = `Endoscopia digestiva`, c = `Motilidad y neurogastro`, and so on.. — Ale Rey, Sep 12 '20 at 15:37
@AleRey I have added an update please check and see if that works for you :) — Duck, Sep 12 '20 at 15:47
thanks @Duck! I've just found a quite useful approach with `mtabulate(strsplit(encuesta$`Orientación prioritaria`, ";"))` — Ale Rey, Sep 12 '20 at 16:35
@AleRey Basicly what you want is a variable of 0-1 for each class, right? — Duck, Sep 12 '20 at 16:46
@AleRey I have added an update, please check and let me know if that is close to what you want! — Duck, Sep 12 '20 at 16:53

score 0 · Answer 3 · answered Sep 11 '20 at 22:30

0

A base R option

data.frame(do.call(
  rbind,
  lapply(
    u <- strsplit(encuesta$`Orientación prioritaria`, ";"),
    `length<-`,
    max(lengths(u))
  )
))

answered Sep 11 '20 at 22:30

ThomasIsCoding

96,636
9
24
81

Create new columns based on part of an observation in another column in r

3 Answers3