0

Sorry for the question, I started using RStudio a month ago and I get confronted to things I've never learned. I checked all the websites, helps and forums possible the past two days and this is getting me crazy.

I got a variable called Release giving the date of the release of a song. Some dates are following the format %Y-%m-%d whereas some others only give me a Year. I'd like them to be all the same but I'm struggling to only modify the observations with the year.

Brief summary in word:

11/11/2011
01/06/2011
1974
1970
16/09/2003

I've imported the data with :

music<-read.csv("music2.csv", header=TRUE, sep = ",", encoding = "UTF-8",stringsAsFactors = F)

And this how I have it in RStudio

"2011-11-11" "2011-06-01" "1974" "1970" "2003-09-16" 

This is an example as I got 2200 obs.

The working code is

Modifdates<- ifelse(nchar(music$Release)==4,paste0("01-01-",music$Release),music$Release)
Modifdates

I obtain this :

"2011-11-11" "2011-06-01" "01-01-1974" "01-01-1970" "2003-09-16" 

I just would like them to be all with the same format "%Y-%m-%d". How can I do that?

So I tried this

as.Date(music$Release,format="%Y-%m-%d")

But I got NA's where I modified my dates.

Could anyone help?

Nigel
  • 3
  • 4
  • You say that the format is `%Y-%m-%d`, but it looks like `%d/%m/%Y` instead. So, you should better add `01/01/`. To detect the only-year values you might check the number of characters with `nchar`: for instance `nchar(music$Release)==4` will detect them so you can add day and month only to them. – nicola Nov 02 '18 at 09:54
  • @nicola Yes, when I open the file in word I have "%d/%m/%Y" but when I open it in RStudio, it's %Y-%m-%d ,this is why I used that. Honestly, it's all new for me and I don't get anything about what I'm supposed to do with that --" – Nigel Nov 02 '18 at 10:33
  • Please, read this post https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example It is much easier to answer the question, working with the sample of your data, instead of guessing how your data might look like.. – utubun Nov 02 '18 at 11:29
  • @utubun I've edited my post. I don't know what else I could tell you... – Nigel Nov 02 '18 at 11:47
  • @Nigel it's ok, it's just a good practice to give people some data to work with. – utubun Nov 02 '18 at 11:50
  • 1
    Also try using the great `lubridate` library, that does a lot of work of matching the correct input format for you. – snaut Nov 02 '18 at 12:26

2 Answers2

1

Welcome to SO, please try to provide a reproducible example next time so that we can best help you. I think here you could use:

testdates <- c("1974", "12-12-2012")
betterdates <- ifelse(nchar(testdates)==4,paste0("01-01-",testdates),testdates)
> betterdates
[1] "01-01-1974" "12-12-2012"

EDIT: if your vector is factor you should use as.character.factor first. If you then want to convert back to factor you can use as.factor

EDIT2 : do not convert as.date before doing this. Only do it after this modification

gaut
  • 5,771
  • 1
  • 14
  • 45
  • Thank you @utubun. When I write your code and change the "dat" into my variable, I only get "NA" everywhere... – Nigel Nov 02 '18 at 10:27
  • Your solution doesn't work either @gpier. My variable is a factor not a vector so the "nchar" isn't working "Error in nchar(music$Release) : 'nchar()' requires a character vector" – Nigel Nov 02 '18 at 10:29
  • 1
    So convert it to the `character`, or when you read your data (by `read.csv()` I suppose, make `stringsAsFactors = F` inside the `read.csv()` call. – utubun Nov 02 '18 at 10:31
  • @utubun, Alright, when I had "stringsAsFactors = F" it works, THANK YOU! The only issue remaining is that now, all my dates in R are written with the format %Y-%m-%d except the ones I just modified %d-%m-%Y When I change the format with "as.date" they turn into NA again... – Nigel Nov 02 '18 at 10:46
  • You have to modify the string to paste in order to get the same format for all dates. I followed the data you gave me and I can't guess what your data looks like... – gaut Nov 02 '18 at 11:00
  • welcome, if that suits you please consider accepting the answer. – gaut Nov 02 '18 at 11:03
  • @gpier This is what I have Modifdates<- ifelse(nchar(music$Release)==4,paste0("01-01-",music$Release),music$Release) Modifdates as.Date(music$Release,format="%Y-%m-%d") The first code is working properly but when I convert it in "as.date", NA's return – Nigel Nov 02 '18 at 11:08
  • How about `as.Date(music$Release,format="%d-%m-%Y")`? – gaut Nov 02 '18 at 11:42
  • @gpier It gives me NA's everywhere – Nigel Nov 02 '18 at 11:48
  • please share your current full code in the question and comment when done – gaut Nov 02 '18 at 11:56
  • @gpier Already done above. But here it is. First part is working but when I put as.Date, only got NA Modifdates<- ifelse(nchar(music$Release)==4,paste0("01-01-",music$Release),music$Release) Modifdates as.Date(music$Release,format="%d-%m-%Y") – Nigel Nov 02 '18 at 12:03
  • And before doing as.Date, I obtain this : "2011-11-11" "2011-06-01" "01-01-1974" "01-01-1970" "2003-09-16" . I just would like them to be all with the same format "%Y-%m-%d". How can I do that? – Nigel Nov 02 '18 at 12:09
1

Update

Using sub find occurrences of date consisting from single year ("(^[0-9]{4}$)" part), using back-reference substitute it to add -01-01 at the end of the string ("\\1-01-01" part), and finally convert it to the date class, using as.Date() (as.Date() default is format = "%Y-%m-%d" so you don't need to specify it):

dat <- c("2011-11-11", "2011-06-01", "1974", "1970", "2003-09-16") 
dat class is character:
as.Date(sub("(^[0-9]{4}$)", "\\1-01-01", dat))

# "2011-11-11" "2011-06-01" "1974-01-01" "1970-01-01" "2003-09-16"
dat class is factor, but sub automatically coerce it to the character class for you:
# dat <- as.factor(dat); dat

# 2011-11-11 2011-06-01 1974       1970       2003-09-16
# Levels: 1970 1974 2003-09-16 2011-06-01 2011-11-11

as.Date(sub("(^[0-9]{4}$)", "\\1-01-01", dat))

# "2011-11-11" "2011-06-01" "1974-01-01" "1970-01-01" "2003-09-16"
utubun
  • 4,400
  • 1
  • 14
  • 17
  • As mentioned in comments above, instead of `asDate()` you can call `lubridate::ymd(sub("(^[0-9]{4}$)", "\\1-01-01", dat))` to obtain the same results. – utubun Nov 02 '18 at 12:45