-1

In my data frame I have some semi-structured data in a column.

df
col1
a|b|c
a b1|b|c
a & b2|b|c 3

from this dataframe$col1 I want to extract only the first word before the "|".

I tried using this

df$col2 <- unlist(strsplit(as.character(df$a),"[|]"))[[1]][1]

but the result was having same value of "a" on all the rows. Why is this and how to handle this ?

Thanks

Hack-R
  • 22,422
  • 14
  • 75
  • 131
amith
  • 399
  • 1
  • 2
  • 11
  • 1
    What is your expected output? Perhaps `library(stringr);str_extract(df$col1, "[[:alnum:]]+(?=\\|)")` – akrun Jul 07 '16 at 19:09
  • `library(tidyr) ; df %>% separate(col1, into = 'col2', sep = '\\|', extra = 'drop', remove = FALSE)` – alistaire Jul 07 '16 at 19:12
  • 1
    Possible duplicate of [Separating a column element into 3 separate columns (R)](http://stackoverflow.com/questions/25194174/separating-a-column-element-into-3-separate-columns-r) – alistaire Jul 07 '16 at 19:15

2 Answers2

0

If we need to extract the characters before the first |

sub("[|].*", "", df$col1)
#[1] "a"      "a b1"   "a & b2"

If we want to extract only the words

library(stringr)
str_extract(df$col1, "[[:alnum:]]+(?=\\|)")  
#[1] "a"  "b1" "b2"
akrun
  • 874,273
  • 37
  • 540
  • 662
0

You are only calling the first list place of the first list object. Because of R's recycle rule that character is repeated for every row in the column.

t <- c("a|junk", "a b|junk", "a b1|junk")
unlist(strsplit(as.character(t),"[|]"))[[1]][1]
[1] "a"

For column splitting, I like to use strsplit() in combination with sapply(). This was something that Hadley Wickham had posted about already on SO.

df$col2 <- sapply(strsplit(as.character(df$a),"[|]"), "[", 1)

https://stackoverflow.com/a/1355660/1146646

Community
  • 1
  • 1
JMT2080AD
  • 1,049
  • 8
  • 15