0

Hey I'm new with R and working on a small project in Rstudio and I need some help. I have data that looks similiar to the following

x=training 1 - Monday- 12h30-15h00

Saturday 16h-20h

Training 2 - Friday-06h-08h0

training 1 - Tuesday - 13h30-15h00

Sunday 16h-20h

Training 3 - Thursday-9h00-10h00

x is a column from a dataframe.

My question is how do I extract specific word like (Sunday, Monday, Tuesday etc...

It should be like:

if x contains Saturday then that row should show Saturday in the New_column

if x contains Sunday then that row should show Sunday in the New_column

if x contains Tuesday then that row should show Tuesday in the New_column

I created a string that contains all weekdays

weekdays <- paste0(weekdays(seq(Sys.Date(), by =1,length = 7)), collapse = "|")

Suggestion 1:

In the following I try extracting weekdays from the column My_Data$Traininghour

My_Data$JOUR<- sub(sprintf('.*(%s).*', weekdays), '\\1',My_Data$Traininghour )

It gives My_Data$JOUR column the exactly same info that is found ind the column My_Data$Traininghour.

Suggestion 2

My_Data$JOUR<-regmatches(My_Data$Traininghour, regexpr (weekdays, My_Data$Traininghour))

Suggestion 2 gives following error:

Assigned data `regmatches(My_Data$Traininghour, regexpr (weekdays, My_Data$Traininghour))` must be compatible with existing data.

x Existing data has 4903 rows.
x Assigned data has 0 rows.
i Only vectors of size 1 are recycled.
Run `rlang::last_error()` to see where the error occurred.

Suggestion 3

My_Data$JOUR <-stringr::str_extract(My_Data$Traininghour, weekdays)

Suggestion 3 return NA in every row in the column My_Data$JOUR

I'm not sure what I'm doing wrong

ev123R
  • 17
  • 4

1 Answers1

2

Create a string that contains all the weekdays to use as the regex pattern.

weekdays <- paste0(weekdays(seq(Sys.Date(), by =1,length = 7)), collapse = "|")

In base R we can extract the weekdays from the x vector as follows:

sub(sprintf('.*(%s).*', weekdays), '\\1', x)
[1] "Monday"   "Saturday" "Friday"   "Tuesday"  "Sunday"   "Thursday"

or even

regmatches(x, regexpr(weekdays, x))
[1] "Monday"   "Saturday" "Friday"   "Tuesday"  "Sunday"   "Thursday"

It is simpler to use stringr package as below:

stringr::str_extract(x, weekdays)
[1] "Monday"   "Saturday" "Friday"   "Tuesday"  "Sunday"   "Thursday"
Onyambu
  • 67,392
  • 3
  • 24
  • 53
  • I did as described above but it did work. I added more information in the original Post – ev123R Apr 20 '22 at 06:45
  • @ev123R you did not run it as given. Note the `.*(%s).*` part that you missed. Also I gave 3 solutions. You should try all of them before concluding that it does not work – Onyambu Apr 20 '22 at 07:03
  • I have tried all the 3 suggestion and have updated the original post with more info. – ev123R Apr 20 '22 at 08:03
  • Your mistake here ev123R is that (1) you provided an almost reproducible example but not quite, if you provided `df <- data.frame(x = c("training 1 - Monday- 12h30-15h00", ...))` it'd be perfect. (2) you didn't test the proposed solutions on the given example but on your full data, which we don't have access to. These details change everything, welcome to SO and consider reading : https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – moodymudskipper Apr 20 '22 at 09:19
  • @ev123R are you using english or french? You must include the correct data in your question. Eg copy the output of `dput(head(My_Data$Traininghour))` amd paste it in the question – Onyambu Apr 20 '22 at 11:09