I have the string
string<-c("file_skimmed_2019-12-09.csv")
I would like to extract the date from this string. In this case: 2019-12-09
. I used
strapplyc(listcsv[1], "[skimmed_.*csv]", simplify = TRUE)
but I obtained a wrong result.
I have the string
string<-c("file_skimmed_2019-12-09.csv")
I would like to extract the date from this string. In this case: 2019-12-09
. I used
strapplyc(listcsv[1], "[skimmed_.*csv]", simplify = TRUE)
but I obtained a wrong result.
This works for me:
library(stringr)
string<-c("file_skimmed_2019-12-09.csv")
str_extract(string, '[0-9]{4}-[1-2]{2}-[0-9]{2}')
I get
[1] "2019-12-09"
If you don't know this check regular expressions. Then you can use any sapply
or map
functions in order to apply this str_extract
to each row in your dataset or each element in your list.
1) Assuming that the reason you want to extract that string is so that you can convert it to Date
class, remove everything up to and including the underscore and then convert to Date
class. This uses the fact that as.Date
ignores junk characters at the end. This uses only a simple regular expression and uses no packages.
as.Date(sub(".*_", "", string))
## [1] "2019-12-09"
2) strapplyc To use strapplyc
as was attempted in the question to get a string result use this code which is likely sufficient:
library(gsubfn)
strapplyc(string, "....-..-..", simplify = TRUE)
## [1] "2019-12-09"
or you can be even more specific with this pattern:
strapplyc(string, "\\d{4}-\\d{2}-\\d{2}", simplify = TRUE)
## [1] "2019-12-09"
3) trimws Using R 3.6 or later we can use trimws
to trim away all non-digits from the beginning and end. This will work as long as there are no digits before or after the date (which is satisfied in the example in the question). This does not use any packages.
trimws(string, whitespace = "\\D")
## [1] "2019-12-09"
4) file_path_sans_ext Use the indicated function to remove the extension and then remove everything up to the underscore. Note that the tools package is included with R so there is nothing to install. The regular expression is the same simple one used in (1).
library(tools)
sub(".*_", "", file_path_sans_ext(string))
## [1] "2019-12-09"
5) Remove everything before and after the date. No packages are used.
gsub(".*_|.csv$", "", string)
## [1] "2019-12-09"