1

I have been trying to remove the line breaks from a data set in R. All the columns are factors hence before i could replace the "\n" with "NA", i need to change the data type from factor to character or Date. I am mentioning my code and a sample data set for a better understanding:

     sku          Stockout_start        Stockout_End      create_date
  0BX-164463    \N                 1/29/2015 11:35  1/29/2015 11:35
  0BX-164463    2/11/2015 18:01               \N    2/11/2015 18:01
  0BX-164464    \N                 1/29/2015 11:38  1/29/2015 11:38
  0BX-164464    1/30/2015 4:38                  \N  1/30/2015 4:38
  0BX-164481    \N                 1/28/2015 9:58   1/28/2015 9:58
  0BX-164482    \N                1/29/2015 11:37   1/29/2015 11:37
  0BX-164482    2/4/2015 7:17                 \N    2/4/2015 7:17
  0BX-164483    \N                1/29/2015 11:37   1/29/2015 11:37
  0BX-164483    2/7/2015 4:37                 \N    2/7/2015 4:37
  0BX-164496    \N                1/29/2015 9:45    1/29/2015 9:45
  0BX-164497    \N                1/28/2015 10:02   1/28/2015 10:02
  0BX-164498    \N                1/29/2015 9:45    1/29/2015 9:45
  0BX-164499    \N                1/29/2015 11:36   1/29/2015 11:36
  0BX-164500    \N                1/29/2015 11:36   1/29/2015 11:36
  0BX-164501    \N                1/29/2015 11:36   1/29/2015 11:36

I have been using below mentioned codes to correct the data:

stk[,2]<- as.Date(as.character(stk[,2]),format = "%y-%m-%d %H:%M:%S")
stk[,2]<- as.character(as.Date(stk[,2], origin = "1970-01-01"))

But these codes change my column 2 to "NA". Kindly help.

akrun
  • 874,273
  • 37
  • 540
  • 662
Akash Singhi
  • 13
  • 1
  • 4
  • 1
    I guess you could specify `na.strings` in the `read.csv/read.table` – akrun Feb 23 '15 at 06:45
  • that is not working. I tried. All are factors and while changing the data type it changes entire 2nd column to NA – Akash Singhi Feb 23 '15 at 06:47
  • Could you show the code you used to read the data. If you need character columns, specify `stringsAsFactors=FALSE` – akrun Feb 23 '15 at 06:48
  • i have been using this code stk <- read.delim("C:/Users/abc/Downloads/stk.xls", stringsAsFactors=FALSE) – Akash Singhi Feb 23 '15 at 06:52
  • 1
    If you had specified `stringsAsFactors=FALSE`, the columns would be character. Please check `str(stk)` – akrun Feb 23 '15 at 06:54
  • I am really confused because for the 3rd column when i use below mentioned code it works: stk[,3]<- as.Date(stk[,3], origin = "1970-01-01") for 3rd column this code works but for 2nd column it doesn't, which is really strange. – Akash Singhi Feb 23 '15 at 06:58
  • @AkashSinghi please paste in your question the result of `dput(tail(stk,40))` so that people can use your exact data set. Otherwise you will always be disappointed by the answers. – RockScience Feb 23 '15 at 07:04
  • @AkashSinghi It did work for me `head(as.Date(stk[,2], format='%m/%d/%Y %H:%M') ,2)# [1] NA "2015-02-11"` – akrun Feb 23 '15 at 07:06
  • @AkashSinghi You are not specifying the `format` correctly based on the input data showed – akrun Feb 23 '15 at 07:08
  • @akrun: Thanks a lot it worked. Now how shall i store the values in my data frame named "stk". I am really new to R. Can you please tell me how to pick this language quickly ? – Akash Singhi Feb 23 '15 at 07:16
  • @AkashSinghi Did you imply to replace the columns with the `Date` class? – akrun Feb 23 '15 at 07:17

3 Answers3

1

You could specify na.strings and stringsAsFactors=FALSE in the read.csv/read.table. (I changed the delimiter to , and saved the input data)

 stk <- read.csv('Akash.csv', header=TRUE, stringsAsFactors=FALSE,
       sep=",", na.strings="\\N")
 head(stk,3)
 #         sku  Stockout_start    Stockout_End     create_date
 #1   0BX-164463            <NA> 1/29/2015 11:35 1/29/2015 11:35
 #2   0BX-164463 2/11/2015 18:01            <NA> 2/11/2015 18:01
 #3   0BX-164464            <NA> 1/29/2015 11:38 1/29/2015 11:38

If you need to replace multiple columns to "Date" class

 stk[-1] <- lapply(stk[-1], as.Date, format='%m/%d/%Y %H:%M') 
 str(stk)
 #'data.frame': 15 obs. of  4 variables:
 #$ sku           : chr  "  0BX-164463" "  0BX-164463" "  0BX-164464" "  0BX-164464" ...
 #$ Stockout_start: Date, format: NA "2015-02-11" ...
 #$ Stockout_End  : Date, format: "2015-01-29" NA ...
 #$ create_date   : Date, format: "2015-01-29" "2015-02-11" ...
akrun
  • 874,273
  • 37
  • 540
  • 662
0

you should indeed clean the data before calling as.Date

Can you first make sure your data.frame has stringAsFactors=FALSE, and try

stk[stk$Stockout_start=="\N","Stockout_start"]=NA

Then your code

stk[,2]<- as.Date(as.character(stk[,2]),format = "%y-%m-%d %H:%M:%S")
stk[,2]<- as.character(as.Date(stk[,2], origin = "1970-01-01"))
RockScience
  • 17,932
  • 26
  • 89
  • 125
  • It is not recognising "\N": Error: '\N' is an unrecognized escape in character string starting ""\N" – Akash Singhi Feb 23 '15 at 07:06
  • then please read http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example before posting a question, and provide a sample data set. Otherwise you will continuously get answers that do not fit your data. – RockScience Feb 23 '15 at 07:18
0

A more simple code is with strptime

stk[,2]<-strptime(stk[,2], "%d/%m/%Y %H:%M")
dax90
  • 1,088
  • 14
  • 29
  • 1
    I don't understand how this is simple. The format is also not correct based on the OP's data. Also, there are multiple date columns. – akrun Feb 23 '15 at 14:05