0

I have a column filename in a dataframe that looks like this:

/testData/THQ/TAIRATE.20030314.190000.tif
/testData/THQ/TAIRATE.20030314.200000.tif
/testData/THQ/TAIRATE.20030314.210000.tif
/testData/THQ/TAIRATE.20030314.220000.tif

And I want to extract the timestamp from this and store it as another column. But I am not familiar with Regex. So far I have gotten to this:

tdat %>%
  dplyr::rowwise() %>% 
  dplyr::mutate(timestamp = str_extract(as.character(filename), "[^//TAIRATE]+$")) %>% 
  glimpse()

Result

.20030314.190000.tif
.20030314.200000.tif
.20030314.210000.tif
.20030314.220000.tif

Expected result

20030314190000
20030314200000
20030314210000
20030314220000

Question: How can I write the correct regex or is there a better way?

maximusdooku
  • 5,242
  • 10
  • 54
  • 94

2 Answers2

1

str_extract and other such functions are vectorized you don't need row-wise.

In this case, you can do this in base R using sub.

sub('.*TAIRATE\\.(\\d+)\\.(\\d+).*', '\\1\\2', df$filename)
#[1] "20030314190000" "20030314200000" "20030314210000" "20030314220000"
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • @maximusdooku Same note as in [my comment](https://stackoverflow.com/questions/61405549/how-can-i-extract-a-string-rowwise-using-regex#comment108626908_61405549): This will have a side effect: if there is no match found, you will end up with the whole file name unchanged – Wiktor Stribiżew Apr 24 '20 at 09:55
  • @WiktorStribiżew Thank you for that cautionary note! Also, as a sidenote - what's a good resource to learn just enough about regex to make it work? I always avoided learning it. – maximusdooku Apr 24 '20 at 09:57
  • @maximusdooku I do not know your level of regex knowledge :) so that I can only suggest doing all lessons at [regexone.com](http://regexone.com/), reading through [regular-expressions.info](http://www.regular-expressions.info), [regex SO tag description](http://stackoverflow.com/tags/regex/info) (with many other links to great online resources), and the community SO post called [What does the regex mean](http://stackoverflow.com/questions/22937618/reference-what-does-this-regex-mean). Also, [rexegg.com](http://rexegg.com) is worth having a look at. – Wiktor Stribiżew Apr 24 '20 at 10:03
  • @maximusdooku For R, also see [this answer of mine](https://stackoverflow.com/a/47251004/3832970). – Wiktor Stribiżew Apr 24 '20 at 10:03
1

Certainly less elegant than @akrun's solution but this one works too:

paste0(unlist(str_extract_all(filename, "[0-9]+")), collapse = "")

Data:

filename <- "/testData/THQ/TAIRATE.20030314.190000.tif"
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34