Phantom date format when reading 2018 dates from .csv files into R

Question

I have several .csv files 1 for each year from 2001 to 2018 each with a date column. They all are in m/d/y format in excel. When I read the files into R with

read.csv(x, stringsAsFactors = FALSE)

each file is correctly converted to y-m-d format for the date column except for the year 2018, which keeps it in the same m/d/y format.

This is an issue when I try to bring the dates into Lubridate because the m/d/y format is not compatible.

I am unsure why only the year 2018 is having this problem.

I have tried copying and pasting the dates into the numerical excel format and importing them into R with read.csv. I have copy and pasted in text format. I have also opened up a new excel and csv file and copied the data into those files. I also manually typed 2018 dates and imported into R with read.csv.

It is possible to fix this by:

y <- read.csv(g, stringsAsFactors = FALSE) %>%
  as.tibble()
y$date <- gsub("/", "-", y$date)
y$date <- parse_date_time(y$date, orders = c('mdy', 'ymd'))
y$date <- as_date(y$date)

But I find this disconcerting and would like to know why excel does this for 2018 only and if others are have encountered this.

This is an example of what my 2001.csv file looks like after

y <- read.csv(x, stringsAsFactors = FALSE) %>%
  as.tibble()
y

# # A tibble: 24 x 4
#    date       species Site_Code number
#    <chr>      <chr>     <int>  <int>
#  1 2001-08-11 WN          23      0
#  2 2001-10-12 WN          23      0
#  3 2001-01-11 EEE         27      0

And this is what the 2018.csv file looks like after being read in with the same code as above:

# A tibble: 84 x 4
 date       species Site_Code number
 <chr>      <chr>     <int>  <int>
1 4/16/2018  EEE         23      0
2 4/30/2018  EEE         23      1
3 5/7/2018   EEE         23      0

No error messages are generated

What does the data look like in the CSV file itself? It should be a plain text file that you can read. Are the dates coded differently in the 2018 file? Are there any "bad" values in that column? It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. — MrFlick, Oct 08 '19 at 15:29
2001 Data: date, virus, Site_Code, number 8/11/2001, WN, 23, 1 2018 Data: date, virus, Site_Code, number 4/16/2018, EEE, 23, 1 The files are just simple csv format, 4 columns with anywhere between 20 and 53 rows of dates with virus, site code and frequency information. All dates are in the same format m/d/y. I do not understand why when I read in these similar files with the same code that the output is different ( a / vs. a - between the date string). — Bvgiorocks, Oct 08 '19 at 19:26

score 0 · Answer 1 · answered Oct 08 '19 at 15:28

Your columns are still char and not date. You can verify this using str(y) and that is what <chr> header in the tibble is trying to tell you. Do the following so as to explicitly convert them to lubridate date format. An example for your y tibble.

y <- read.csv(x, stringsAsFactors = FALSE) %>%
  as.tibble()
# str(y)
#Classes ‘tbl_df’, ‘tbl’ and 'data.frame':  3 obs. of  4 variables:
# $ date      : chr  "4/16/2018" "4/30/2018" "5/7/2018"
# $ species   : chr  "EEE" "EEE" "EEE"
# $ Stite_Code: num  23 23 23
# $ number    : num  0 1 0

# date col is still chr
# do the following
y$date <- mdy(y$date)
str(y)

# Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 3 obs. of  4 variables:
# $ date      : Date, format: "2018-04-16" "2018-04-30" ...
# $ species   : chr  "EEE" "EEE" "EEE"
# $ Stite_Code: num  23 23 23
# $ number    : num  0 1 0

# Now date is Date format

Data

y <-structure(list(date = c("4/16/2018", "4/30/2018", "5/7/2018"), 
    species = c("EEE", "EEE", "EEE"), Stite_Code = c(23, 23, 
    23), number = c(0, 1, 0)), row.names = c(NA, -3L), class = c("tbl_df", 
"tbl", "data.frame"))

Yes I know they are still . What I want to know is why do my 2001 dates end up in y-m-d format and my 2018 dates in m/d/y after passing through the same read.csv argument. Both files are in the exact same format when I import them (m/d/y). Like I said I already have code for a fix using parse_dates, I just want to know why this is happening. — Bvgiorocks, Oct 08 '19 at 19:30

Phantom date format when reading 2018 dates from .csv files into R

1 Answers1

Data