0

So I have a text that I am trying to extract from. Here is my text:

Charge: Larceny; Charge: Stealing a motor vehicle;

And I am trying to create this

Charge1     Charge2                        Charge3
Larceny     Stealing a motor vehicle       NA

Any ideas? Right now my code looks like this:

data$charge <- str_extract_all(data, "(?=Charge:)(\\D){4,100}")

But it only created one column. Please help!

Anna Rouw
  • 69
  • 2
  • 8
  • This answer could help you https://stackoverflow.com/questions/4350440/split-data-frame-string-column-into-multiple-columns – Lucas Prestes Jul 30 '18 at 16:56
  • Try with `tidyverse` `tibble(str1) %>% separate_rows(str1, sep= ";\\s*") %>% separate(str1, into = c("col1", "col2"), sep=":\\s*") %>% mutate(col1 = na_if(col1, "")) %>% fill(col1) %>% mutate(col1 = paste0(col1, row_number())) %>% spread(col1, col2)` – akrun Jul 30 '18 at 17:07

4 Answers4

0

If your text is all in the same format this would be pretty easy with tidyverse:

require(tidyverse)
df <- data.frame(text = c("Charge: Larceny; Charge: Stealing a motor vehicle;", 
                       "Charge: some_charge; Charge: another_charge; Charge: something_else"))

df %>% separate(text, c("Charge1", "Charge2", "Charge3"), sep = "; Charge: ") %>%
        mutate(Charge1 = gsub("Charge: ", "", Charge1))

You may need to clean up some hanging semicolons though

0

We can use tidyverse to do this

library(tidyerse)
tibble(str1) %>%
     separate_rows(str1, sep= ";\\s*") %>%
     separate(str1, into = c("col1", "col2"), sep=":\\s*") %>% 
     mutate(col1 = na_if(col1, "")) %>% 
     fill(col1) %>%
     mutate(col1 = paste0(col1, row_number())) %>%
     spread(col1, col2)
# A tibble: 1 x 3
# Charge1 Charge2                  Charge3
#  <chr>   <chr>                    <chr>  
#1 Larceny Stealing a motor vehicle NA     

data

str1 <- "Charge: Larceny; Charge: Stealing a motor vehicle;"
akrun
  • 874,273
  • 37
  • 540
  • 662
0

using base R:

read.table(text=gsub("\\s*Charge:\\s*","",strng),sep=";",fill=T,col.names = paste0("Charge",1:3))

  Charge1                  Charge2 Charge3
1 Larceny Stealing a motor vehicle      NA

You may also use strcapture. But not as flexible as gsub:

 strcapture(paste0(rep("\\s*Charge:\\s*([^;]+);",2),collapse=""),strng,data.frame(charge1=character(),charge2=character()))
  charge1                  charge2
1 Larceny Stealing a motor vehicle
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

Slightly modifying your solution. Note the difference between ?= and ?<= (lookahead and lookbehind) and that \\D matches ;.

str_extract_all(data, "(?<=Charge: )[^;]+")
[[1]]
[1] "Larceny"                  "Stealing a motor vehicle"

So str_extract_all() will return a list of vectors, how to get them into a data.frame can be seen in other corners of StackOverflow.

s_baldur
  • 29,441
  • 4
  • 36
  • 69