1

I'm facing the following problem. I've got a table with a column called title.

The title column contains rows with values like To kill a mockingbird (1960).

So basically the format of the column is [title] ([year]). What I need are two columns: title and year, year without brackets.

One other problem is that some rows contain a title including brackets. But basically the last 6 characters of every row are year wrapped in brackets.

How do I create the two columns, title and year?

What I have is:

Books$title <- c("To kill a mockingbird (1960)", "Harry Potter and the order of the phoenix (2003)", "Of mice and men (something something) (1937)")

title
To kill a mockingbird (1960)
Harry Potter and the order of the phoenix (2003)
Of mice and men (something something) (1937)

What I need is:

Books$title <- c("To kill a mockingbird", "Harry Potter and the order of the phoenix", "Of mice and men (something something)")
Book$year <- c("1960", "2003", "1937")

title                                             year
To kill a mockingbird                             1960
Harry Potter and the order of the phoenix         2003
Of mice and men (something something)             1937
Uwe
  • 41,420
  • 11
  • 90
  • 134
John Doe
  • 13
  • 4
  • You should provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Vincent Bonhomme Oct 01 '17 at 12:43

3 Answers3

2

We can work around substring the last 6 characters.

First we recreate your data.frame:

df <- read.table(h=T, sep="\n", stringsAsFactors = FALSE,
text="
Title
To kill a mockingbird (1960)
Harry Potter and the order of the phoenix (2003)
Of mice and men (something something) (1937)")

Then we create a new one. The first column, Title is everything from df$Title but the last 7 characters (we also remove the trailing space). The second column, Year is the last 6 characters from df$Title and we remove any space, opening or closing bracket. (gsub("[[:punct:]]", ...) would have worked as well.

data.frame(Title=substr(df$Title, 1, nchar(df$Title)-7),
           Year=gsub(" |\\(|\\)", "", substr(df$Title, nchar(df$Title)-6, nchar(df$Title))))


                                      Title Year
1                     To kill a mockingbird 1960
2 Harry Potter and the order of the phoenix 2003
3     Of mice and men (something something) 1937

Does that solve your problem?

Vincent Bonhomme
  • 7,235
  • 2
  • 27
  • 38
1

try to use substrRight(df$Title, 6) in a loop to extract last 6 characters so the year with brackets and save it as new column

Extracting the last n characters from a string in R

0

Similar to @Vincent Bonhomme:

I assumue that the data are in some text file that I have called so.dat from where I read the data into a data.frame that also contains two columns for the title and year to be extracted. Then I use substr() to separate title from the fixed length year at the end, leaving the () alone as the OP apparently wants them:

titles      <- data.frame( orig = readLines( "so.dat" ), 
               text = "", yr = "", stringsAsFactors = FALSE )
titles$text <- substring( titles[ , 1 ], 
               1, nchar( titles[ , 1 ] ) - 7 )
titles$yr   <- substring( titles[ , 1 ], 
               nchar( titles[ , 1 ] ) - 5, nchar( titles[ , 1 ] ) )

The original data can be removed or not, dpending upon further need.

vaettchen
  • 7,299
  • 22
  • 41