1

I have a data frame like this I read in from a .csv (or .xlsx, I've tried both), and one of the variables in the data frame is a vector of dates.

Generate the data with this

Name <- rep("Date", 15)
num <- seq(1:15)
Name <- paste(Name, num, sep = "_")
data1 <- data.frame(
Name,
Due.Date = seq(as.Date("2020/09/24", origin = "1900-01-01"), 
as.Date("2020/10/08", origin = "1900-01-01"), "days")
)

When I reference one of the cells specifically, like this: str(project_dates$Due.Date[241]) it reads the date as normal.

However, the exact position of the important dates varies from project to project, so I wrote a command that identifies where the important dates are in the sheet, like this: str(project_dates[str_detect(project_dates$Name, "Date_17"), "Due.Date"])

This code worked on a few projects, but on the current project it now returns a character vector of length 2. One of the values is the date, and the other value is NA. And to make matters worse, the location of the date and the NA is not fixed across dates--the date is the first value in some cells and the second in others (otherwise I would just reference, e.g., the first item in the vector).

What is going on here, but more importantly, how do I fix this?!

Clarification on the second command:

When I was originally reading from an Excel file, the command was project_dates[str_detect(project_dates$Name, "Date_17"), "Due.Date"]$Due.Date because it was returning a 1x1 tibble, and I needed the value in the tibble.

When I switched to reading in data as a csv, I had to remove the $Due.Date because the command was now reading the value as an atomic vector, so the $ operator was no longer valid.

Help me, Oh Blessed 1's (with) Knowledge! You're my only hope!

Edited to include an image of the data like the one that generates the error

TCarsel
  • 21
  • 3
  • 2
    Please [make this question reproducible](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) by including, at the very least, some example data which generates the issue with the commands used. – neilfws Sep 24 '20 at 23:33
  • I added a screenshot of the data very much like the one that generates the issue. I hope it helps, but the data that *doesn't* generate this issue also looks very much like it. It seems to be specific to this file, but I need to have my code robust to this issue if it happens again @neilfws – TCarsel Sep 24 '20 at 23:54
  • Also, thanks for taking a look! – TCarsel Sep 24 '20 at 23:56
  • 1
    Please supply the data as either plain text using for example `dput(mydataframe)`, or lines indented by 4 spaces, or as a link to the CSV/Excel file. Users cannot copy/paste data from images. – neilfws Sep 25 '20 at 00:03
  • I'm sorry, the code I added to my question isn't actually outputting the dates correctly, and my wife is yelling at me to go get dinner. I'll be back to fix it after. Thank you for your patience. – TCarsel Sep 25 '20 at 00:17
  • Do you want `Due.Date` where `Name = 'Date_17'` ? i.e `project_dates$Due.Date[project_dates$Name == 'Date_17']` – Ronak Shah Sep 25 '20 at 01:24
  • Tibbles made a design choice to not drop dimensions by default like data frames do. Compare `class(mtcars[1, 1])` (numeric) with `class(as_tibble(mtcars)[1, 1])` (still a tibble). Make sure your data is a tibble to start, not a vanilla data frame, and you'll get consistent behavior. – Gregor Thomas Sep 25 '20 at 02:01
  • Also, `str_detect` is great when you need to look for a pattern *inside* a string, or if you're using regex like wildcards. But for an exact match of the whole string it's simpler to use `==` or `%in%`. I'd suggest changing `str_detect(project_dates$Name, "Date_17")` to `project_dates$Name %in% "Date_17"` -- `%in%` can be safer with missing values than `==`. – Gregor Thomas Sep 25 '20 at 02:05
  • As to your question about the `NA`s, the code you share adapted to the data you share, `data1[str_detect(data1$Name, "Date_13"), "Due.Date"]` seems to work fine. Can you share a sample of data that demonstrates the problem, and be clear about what you want instead? – Gregor Thomas Sep 25 '20 at 02:08
  • I used str_detect because the naming conventions aren't consistent across consultants, so looking for a specific entire string ended up being too brittle. The original code was project_dates[Name == "Date_17"], but now I use str_detect instead to find the shortest pattern inside each string. – TCarsel Sep 25 '20 at 02:45
  • And the weird thing is that after inspecting the raw files of the projects that do not give a problem and the file that does, and I couldn't find the source of the problem. I looked at the properties of all the files, and honestly I'm dumbfounded. – TCarsel Sep 25 '20 at 02:58
  • Let me see if copy/pasting the data into a new file replicates the error (if so, I'll share the stripped data) – TCarsel Sep 25 '20 at 03:03

1 Answers1

1

I feel sheepish.

I was able to remove the NAs with data1<- data1[!is.na(data1$Due.Date), ].

I assumed that command would listwise delete the rows with any missing values, so if the cell contained the 2-length vector, then I would lose the whole row of data. Instead, it removed the NA from the cell, leaving only the date.

Thank you to everyone who commented and offered help!

TCarsel
  • 21
  • 3