8

Having a dataframe like this:

data.frame(text = c("separate1: and: more","another 20: 42")

How is it possible to separate using the first : in every row? Example expected output

data.frame(text1 = c("separate1","another 20"), text2 = c("and: more","42")
Sotos
  • 51,121
  • 6
  • 32
  • 66
Nathalie
  • 1,228
  • 7
  • 20
  • 1
    Does this answer your question? [Split data frame string column into multiple columns](https://stackoverflow.com/questions/4350440/split-data-frame-string-column-into-multiple-columns) – Claudiu Papasteri Feb 10 '20 at 14:49
  • 2
    @ClaudiuPapasteri nope. That is not exactly the same and the fact that the accepted solution works here is accidental. Duping with what you suggest can be very misleading – Sotos Feb 10 '20 at 14:52
  • @Sotos yes you are right. I wasn't paying attention, two of the solutions work but coincidentally. Sorry for that. I added my two cents to the solution pool as apologie for wrong flag. – Claudiu Papasteri Feb 10 '20 at 19:14

8 Answers8

5

In base you can use regexpr to find the position of the first : which can be used to extract substrings and trimws to remove whitespaces.

x <- c("separate1: and: more","another 20: 42")

i <- regexpr(":", x)
data.frame(text1 = trimws(substr(x, 1, i-1)), text2 = trimws(substring(x, i+1)))
#       text1     text2
#1  separate1 and: more
#2 another 20        42
GKi
  • 37,245
  • 2
  • 26
  • 48
4
library(reshape2)

df <- data.frame(text = c("separate1: and: more","another 20: 42")

colsplit(df$text, ":", c("text1", "text2"))
Georgery
  • 7,643
  • 1
  • 19
  • 52
4

You can use str_split_fixed from stringr package which will by default split on the first delimiter, i.e.

stringr::str_split_fixed(d1$text, ':', 2)

#     [,1]         [,2]        
#[1,] "separate1"  " and: more"
#[2,] "another 20" " 42"       
Sotos
  • 51,121
  • 6
  • 32
  • 66
4
df <- data.frame(text = c("separate1: and: more","another 20: 42"))

df$text1 <- gsub(':.*', '', df$text)
df$text2 <- gsub('^[^:]+: ', '', df$text)

df
#                   text      text1     text2
# 1 separate1: and: more  separate1 and: more
# 2       another 20: 42 another 20        42
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
4

Using tidyr:

library(dplyr)
library(tidyr)

df %>% 
  separate(text, c("a", "b"), sep = ": ", extra = "merge")
#            a         b
# 1  separate1 and: more
# 2 another 20        42
zx8754
  • 52,746
  • 12
  • 114
  • 209
3

Another base R solution

df <- do.call(rbind,lapply(as.character(df$text), function(x) {
  k <- head(unlist(gregexpr(":",x)),1)
  data.frame(text1 = substr(x,1,k-1),
             text2 = substr(x,k+1,nchar(x)))
}))

such that

> df
       text1      text2
1  separate1  and: more
2 another 20         42
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
2

Sorry, @Sotos is right, this isn't a duplicate. Here is another base solution that splits on first occurrence of delimiter.

df <- data.frame(text = c("separate1: and: more","another 20: 42"))

list <- apply(df, 1, function(x) regmatches(x, regexpr(":", x), invert = TRUE))
df <- data.frame(matrix(unlist(list), nrow = length(list), byrow = TRUE))

df
#>           X1         X2
#> 1  separate1  and: more
#> 2 another 20         42

Created on 2020-02-10 by the reprex package (v0.2.1)

Claudiu Papasteri
  • 2,469
  • 1
  • 17
  • 30
2

Poor old ?utils::strcapture never gets any respect:

strcapture("^(.+?):(.+$)", df$text, proto=list(text1="", text2=""))
#       text1      text2
#1  separate1  and: more
#2 another 20         42

Inserted back:

cbind(df, strcapture("^(.+?):(.+$)", df$text, proto=list(text1="", text2="")))
#                  text      text1      text2
#1 separate1: and: more  separate1  and: more
#2       another 20: 42 another 20         42
thelatemail
  • 91,185
  • 12
  • 128
  • 188