6

As in the example, I am trying to substring the Video_full column in a data.frame (video_data_2) I am working on. I want to keep all the characters after the period. The period is always present, there is only one period and it is in a different position in each value for the column.

     Date                     Video_full      Instances   
1 Apr 1, 2010  installs/AA.intro_video_1      546         
2 Apr 1, 2010  installs/ABAC.intro_video_2    548      

I got substring to work:

video_data_2$Video_full <- substring(video_data_2$Video_full,11)

And strsplit also:

strsplit("installs/AA.intro_video_1 ",'[.]')

I'm just not able to figure out how to start the substring in a dynamic position or only keep the second value returned by strsplit.

Thanks for any help you can offer for a simple question.

Marek
  • 49,472
  • 15
  • 99
  • 121
analyticsPierce
  • 2,979
  • 9
  • 57
  • 81

4 Answers4

9

you can use sub()

video_data_2$Video_full <- sub("^.*\\.","", video_data_2$Video_full)
kohske
  • 65,572
  • 8
  • 165
  • 155
8

Another way to use strsplit

sapply(strsplit(video_data_2$Video_full, "\\."), "[", 2)

which is shorthand from

sapply(strsplit(video_data_2$Video_full, "\\."), function(x) x[2])
Marek
  • 49,472
  • 15
  • 99
  • 121
  • +1 I like very much the use of "[". What does it mean? (and where is the explanation in R help?) – gd047 Jun 09 '10 at 09:17
  • 2
    @gd047 Indexing operator "[" is a function and you can reach its help by `?"["` (or `help("[")`). You could use it as any other function e.g.: `\`[\`(letters,3:5)`, but it's really helpful in cases like question or `do.call` or other places when you must directly provide name of function. – Marek Jun 09 '10 at 09:38
  • thank you for providing this answer. I am not sure why but when I ran this function I got a 'non-character argument' error. Any thoughts on what would cause that? – analyticsPierce Jun 11 '10 at 05:13
  • I suppose `video_data_2$Video_full` is a `factor`. So try `sapply(strsplit(as.character(video_data_2$Video_full), "\\."), "[", 2)` – Marek Jun 11 '10 at 07:41
5

Try stringr

library(stringr)
str_split_fixed(video_data_2$Video_full, "\\.", n = 2)[, 2]
hadley
  • 102,019
  • 32
  • 183
  • 245
  • This solution is much slower than others. You can see this for 10,000 length vector. – Marek Jun 10 '10 at 15:29
  • Prove it! Plus why worry about speed unless you have to. – hadley Jun 10 '10 at 20:33
  • thank you for your answer. I went through your docs for this package and would get a lot of use out of it. However, I was not able to get it to install. I'm using the Rbundle in textmate and tried install.packages("stringr", repos = "http://cran.r-project.org/src/contrib/stringr_0.3.tar.gz", type="source"), the message I got back said the package was unavailable. Sorry if this should be a separate question. – analyticsPierce Jun 11 '10 at 05:27
  • You should only need `install.packages("stringr")`. That path is not a valid repository. – hadley Jun 11 '10 at 14:18
3

an approach using strsplit

video_data_2$Video_full <- sapply(strsplit(video_data_2$Video_full, "\\."),head)[2,]
gd047
  • 29,749
  • 18
  • 107
  • 146
  • Similar to the first answer provided by @Marek, I received a 'non-character argument' error when I tried this. Any thoughts on what might cause it? – analyticsPierce Jun 11 '10 at 05:19