Split a column in R

Question

In my data frame I have some semi-structured data in a column.

df
col1
a|b|c
a b1|b|c
a & b2|b|c 3

from this dataframe$col1 I want to extract only the first word before the "|".

I tried using this

df$col2 <- unlist(strsplit(as.character(df$a),"[|]"))[[1]][1]

but the result was having same value of "a" on all the rows. Why is this and how to handle this ?

Thanks

What is your expected output? Perhaps `library(stringr);str_extract(df$col1, "[[:alnum:]]+(?=\\|)")` — akrun, Jul 07 '16 at 19:09
`library(tidyr) ; df %>% separate(col1, into = 'col2', sep = '\\|', extra = 'drop', remove = FALSE)` — alistaire, Jul 07 '16 at 19:12
Possible duplicate of [Separating a column element into 3 separate columns (R)](http://stackoverflow.com/questions/25194174/separating-a-column-element-into-3-separate-columns-r) — alistaire, Jul 07 '16 at 19:15

score 0 · Answer 1 · answered Jul 07 '16 at 19:15

0

If we need to extract the characters before the first |

sub("[|].*", "", df$col1)
#[1] "a"      "a b1"   "a & b2"

If we want to extract only the words

library(stringr)
str_extract(df$col1, "[[:alnum:]]+(?=\\|)")  
#[1] "a"  "b1" "b2"

answered Jul 07 '16 at 19:15

akrun

874,273
37
540
662

score 0 · Answer 2 · edited May 23 '17 at 12:01

You are only calling the first list place of the first list object. Because of R's recycle rule that character is repeated for every row in the column.

t <- c("a|junk", "a b|junk", "a b1|junk")
unlist(strsplit(as.character(t),"[|]"))[[1]][1]
[1] "a"

For column splitting, I like to use strsplit() in combination with sapply(). This was something that Hadley Wickham had posted about already on SO.

df$col2 <- sapply(strsplit(as.character(df$a),"[|]"), "[", 1)

https://stackoverflow.com/a/1355660/1146646

Split a column in R

2 Answers2