-1

Can you help me delimit the data in column 1(rawtext) to

column 2(name), column 3(timestamp), column 4(speech_text)?

The data looks like this:


    column 1                                                         column 2
[1] firstname Lastname:           00:01     text text. text.          0
[2] firstname lastname2:          00:008    text, text text.          0

I need it to look like

column1                     column 2      colum3

[1] Firstname lastname      00:01         text text. text.
[2] firstname lastname2     00:08         text, text text.
  • 1
    Possible duplicate of [Split data frame string column into multiple columns](https://stackoverflow.com/questions/4350440/split-data-frame-string-column-into-multiple-columns) – yusuzech Sep 24 '19 at 20:16

2 Answers2

0

If I understand correctly, the OP has a data.frame of which the first column should be splitted into three separate columns. The columns are separated by 4 or more white space characters.

The data.table package has the tstrsplit() function which is an abbreviation of transpose(strsplit(...)):

library(data.table)
setDT(df)[, c("name", "timestamp", "speech_text") := tstrsplit(column1, "\\s{4,}")]
df
                                                    column1 column2                 name timestamp      speech_text
1: firstname Lastname:           00:01     text text. text.       0  firstname Lastname:     00:01 text text. text.
2: firstname lastname2:          00:008    text, text text.       0 firstname lastname2:    00:008 text, text text.

Note that the new columns have been appended to the original data.frame df in place.

Also note that tstrsplit() has coerced column1 from factor to character by default.

Data

df <- data.frame(
  column1 = c("firstname Lastname:           00:01     text text. text.", 
              "firstname lastname2:          00:008    text, text text."),
  column2 = 0)
df
                                                   column1 column2
1 firstname Lastname:           00:01     text text. text.       0
2 firstname lastname2:          00:008    text, text text.       0
Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134
-1

You can use strsplit and regex to find more than 3 spaces toghether.

#Replication of the dataframe
l1 = "firstname Lastname:           00:01     text text. text.          0"
l2 = "firstname lastname2:          00:008    text, text text.          0"
df = rbind(l1,l2)

# Using strsplit with Regex to find separation with 3 or more spaces.

df2=as.data.frame(matrix(unlist(strsplit(df,"\\s{3,}")),nrow = nrow(df), byrow=T),stringsAsFactors = F)

Strplit generates a list, so it's necessary to unlist and use a matrix to recreate a data frame.

The output is:

+----------------------+--------+------------------+----+
|          V1          |   V2   |        V3        | V4 |
+----------------------+--------+------------------+----+
| firstname Lastname:  | 00:01  | text text. text. |  0 |
| firstname lastname2: | 00:008 | text, text text. |  0 |
+----------------------+--------+------------------+----+
Angelo Canepa
  • 1,701
  • 2
  • 13
  • 21
  • Thank you! However, I received this error: Error in strsplit(df, "\\s{3,}") : non-character argument – Shehroz Malik Sep 24 '19 at 21:32
  • In my example df is a character. It seems like in your example df is a dataframe, so in strsplit you need to indicate which column are you taking, for example, **strplit(df$V1,"\\s{3,}")** – Angelo Canepa Sep 24 '19 at 21:51