0

I was wondering how I could be able to split the Author name from the study year and put them in separate columns. The data frame that I have is as below:

df <- 
Study                N
John et al., 2003    10
Nich et al., 1988    15

Result should be:

df <-
Study         Year    N
John et al.,  2003    10
Nich et al.,  1988    15

I am using R.

Amer
  • 2,131
  • 3
  • 23
  • 38
  • Do any of the answers [here](http://stackoverflow.com/questions/7069076/split-column-at-delimiter-in-data-frame) help? – alexforrence Nov 01 '15 at 22:56
  • @alexforrence The answers there deal with specific case. I wasn't able to apply them to my dataframe. My data frame has other columns that I don't want to break. – Amer Nov 01 '15 at 23:14

3 Answers3

1

You could use regular expressions to select both parts

df$Year <- gsub("^.*, ", "", df$Study) #remove everything before ", "
df$Study <- gsub(",.*$", "", df$Study) #remove everything after ","
Thierry
  • 18,049
  • 5
  • 48
  • 66
0

You can also do this with extract from tidyr.

library(dplyr)
library(tidyr)
df %>%
  extract(Study, c("Author", "Year"), "(.*), ([0-9]{4})")
bramtayl
  • 4,004
  • 2
  • 11
  • 18
0

We can also use data.table. We convert the 'data.frame' to 'data.table' (setDT(df), split the 'Study' using tstrsplit, and change the column names with setnames.

library(data.table)#v1.9.6+
setnames(setDT(df)[, c(tstrsplit(Study, '(?<=,) ', perl=TRUE), 
                list(N=N))], 1:2, c('Year', 'Study'))[]
#           Year Study  N
#1: John et al.,  2003 10
#2: Nich et al.,  1988 15
akrun
  • 874,273
  • 37
  • 540
  • 662