1

I have the string

string<-c("file_skimmed_2019-12-09.csv")

I would like to extract the date from this string. In this case: 2019-12-09. I used

strapplyc(listcsv[1], "[skimmed_.*csv]", simplify = TRUE)

but I obtained a wrong result.

camille
  • 16,432
  • 18
  • 38
  • 60
Mark
  • 1,577
  • 16
  • 43
  • 2
    Have you seen [this](https://stackoverflow.com/questions/46528912/extract-date-from-a-given-string-in-r)? – jazzurro Jan 24 '20 at 15:44
  • 2
    Maybe this `sub(".*_(.*).csv", "\\1", string)`? – Dan Jan 24 '20 at 15:46
  • Both solutions work! Thank you very much! – Mark Jan 24 '20 at 15:49
  • 1
    This will match the exact date format that you have: `regmatches(string, regexpr("\\d{4}\\-\\d{2}\\-\\d{2}", string))` – TheSciGuy Jan 24 '20 at 15:50
  • If you use stringr, you could also try ```sapply(stringr::str_extract_all(string, "\\d{4}-\\d{2}-\\d{2}"), "[[", 1)``` – Caroline Jan 24 '20 at 15:56
  • Where does `strapplyc` come from? What's `listcsv[1]`, is that the same as `string`? Make sure the code you post here is consistent and reproducible. Also, try to be more specific than "a wrong result" – camille Jan 24 '20 at 15:58

2 Answers2

3

This works for me:

library(stringr)

string<-c("file_skimmed_2019-12-09.csv")

str_extract(string, '[0-9]{4}-[1-2]{2}-[0-9]{2}')

I get

 [1] "2019-12-09"

If you don't know this check regular expressions. Then you can use any sapply or map functions in order to apply this str_extract to each row in your dataset or each element in your list.

NelsonGon
  • 13,015
  • 7
  • 27
  • 57
Tomas -
  • 91
  • 8
3

1) Assuming that the reason you want to extract that string is so that you can convert it to Date class, remove everything up to and including the underscore and then convert to Date class. This uses the fact that as.Date ignores junk characters at the end. This uses only a simple regular expression and uses no packages.

as.Date(sub(".*_", "", string))
## [1] "2019-12-09"

2) strapplyc To use strapplyc as was attempted in the question to get a string result use this code which is likely sufficient:

library(gsubfn)

strapplyc(string, "....-..-..", simplify = TRUE)
## [1] "2019-12-09"

or you can be even more specific with this pattern:

strapplyc(string, "\\d{4}-\\d{2}-\\d{2}", simplify = TRUE)
## [1] "2019-12-09"

3) trimws Using R 3.6 or later we can use trimws to trim away all non-digits from the beginning and end. This will work as long as there are no digits before or after the date (which is satisfied in the example in the question). This does not use any packages.

trimws(string, whitespace = "\\D")
## [1] "2019-12-09"

4) file_path_sans_ext Use the indicated function to remove the extension and then remove everything up to the underscore. Note that the tools package is included with R so there is nothing to install. The regular expression is the same simple one used in (1).

library(tools)
sub(".*_", "", file_path_sans_ext(string))
## [1] "2019-12-09"

5) Remove everything before and after the date. No packages are used.

gsub(".*_|.csv$", "", string)
## [1] "2019-12-09"
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341