-1

I have a column in dataframe df with value 'name>year>format'. Now I want to split this column by > and add those values to new columns named as name, year, format. How can I do this in R.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
SNT
  • 1,283
  • 3
  • 32
  • 78

3 Answers3

1

An option is read.table in base R

cbind(df, read.table(text = as.character(df$column), sep=">", 
         header = FALSE, col.names = c("name", "year", "format")))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • In case I have 3 different columns to split like this I do the same thrice ? – SNT Jan 03 '20 at 23:29
  • @SNT You can create a function and apply or use it in `lapply` i.e. `lapply(df[1:3], function(x) read.table(text = as.character(x), sep=">", header = FALSE, col.names = c('one', 'two', 'three')))` – akrun Jan 03 '20 at 23:37
1

You can do that easily using separate function in tidyr;

library(tidyr)
library(dplyr)

data <- 
  data.frame(
    A = c("Joe>1993>student")
  )

data %>%
  separate(A, into = c("name", "year", "format"), sep = ">", remove = FALSE)

# A name year  format
# Joe>1993>student  Joe 1993 student

If you do not want the original column in the result dataframe change remove to TRUE

Nareman Darwish
  • 1,251
  • 7
  • 14
1

In case your data is big, it would be a good idea to use data.table as it is very fast.

If you know how many fields your "combined" column has:

Suppose the column has 3 fields, and you know it:

library(data.table)

# the 1:3 should be replaced by 1:n, where n is the number of fields
dt1[, paste0("V", 1:3) := tstrsplit(y, split = ">", fixed = TRUE)]

If you DON'T know in advance how many fields the column has:

Now we can get some help from the stringi package:

library(data.table)
library(stringi)

maxFields <- dt2[, max(stri_count_fixed(y, ">")) + 1]
dt2[, paste0("V", 1:maxFields) := tstrsplit(y, split = ">", fixed = TRUE, fill = NA)]

Data used:

library(data.table)

dt1 <- data.table(x = c("A", "B"), y = c("letter>2018>pdf", "code>2020>Rmd"))

dt2 <- rbind(dt1, data.table(x = "C", y = "report>2019>html>pdf"))
PavoDive
  • 6,322
  • 2
  • 29
  • 55