0

This is my first post on this site and I have very little experience posting on forums, so let me know if I'm doing anything wrong.

I just started my Data Analyst journey and I am working on my first project using R.

I am importing data from Google Spreadsheet which has a lot of empty rows at the end that I need to delete. I need to delete them as I will merge other sheets together which also have empty rows at the end. I tried the solutions from How to delete empty rows in multiple dataframes in a list from OMMJREN, but none of them are working.

The data is obtained from a Google Spreadsheet which receive responses from a Google Form with timestamps

My dataframe looks like this:

View(df)
     DATE                     DRIVER         DELIVERY_TIME
1     2021-12-01 12:20:37      John          NA (blank)
2     2021-12-01 12:32:51      Jack          NA (blank)
3     2021-12-01 13:07:22      Jill          2:16pm
...
10    NA (blank)               NA (blank)    NA (blank)
11    NA (blank)               NA (blank)    NA (blank)
summary(df)
      DATE                        ROUTE             DELIVERY_TIME 
 Min.   :2021-12-01 12:20:37   Length:667         Length:667
 1st Qu.:2021-12-09 13:24:17   Class :character   Class :character   
 Median :2021-12-17 06:03:06   Mode  :character   Mode  :character  
 Mean   :2021-12-16 10:01:41                                                                                                 
 3rd Qu.:2021-12-21 22:39:37                                                                                                 
 Max.   :2021-12-31 17:43:14                                                                                                 
 NA's   :311
str(df)
tibble [667 x 18] (S3: tbl_df/tbl/data.frame)
 $ DATE          : POSIXct[1:667], format: "2021-12-01 12:20:37" "2021-12-01 12:32:51" "2021-12-01 13:07:22" "2021-12-01 13:54:25" ...
 $ DRIVER        : chr [1:667] "John" "Jack" ...
 $ DELIVERY_TIME : chr [1:667] NA NA "2:16pm" NA ...

When I try the code below, I get this error:

df[!apply(df == "", 1, all), ]

Error in as.POSIXlt.character(x, tz, ...) : 
character string is not in a standard unambiguous format

When I try the code below, I get this error:

df <- df %>% 
  dplyr::filter(!(DATE==""))

Error in `dplyr::filter()`:
! Problem while computing `..1 = !(DATE == "")`.
Caused by error in `as.POSIXlt.character()`:
! character string is not in a standard unambiguous format
Backtrace:
  1. df %>% ...
  8. lubridate `==.POSIXt`(DATE, "")
 10. base::as.POSIXct.default(e2)
 13. base::as.POSIXlt.character(x, tz, ...)
 14. base::stop("character string is not in a standard unambiguous format")
 Error in dplyr::filter(., !(DATE == "")) : 
Caused by error in `as.POSIXlt.character()`:
! character string is not in a standard unambiguous format

So my guess is that I cannot delete my rows due to the data type of my first column.

Thank you in advance for your help and sorry if I'm making any mistakes.

EDIT1: Added a third column that I had removed as I thought it was irrelevant but is not. I also have some other columns that will be mostly blank, seldom having data.

  • Have you tried `df <- na.omit(df)` or solutions from this post https://stackoverflow.com/questions/4862178/remove-rows-with-all-or-some-nas-missing-values-in-data-frame ? – Ronak Shah Feb 20 '22 at 02:38
  • `na.omit(df)` as I have other columns that have NA datas and it ends up deleting the important rows. When I `View(df)`, I'm not getting data anymore. I have edited my post to take those columns into account. – Raphael HHK Feb 20 '22 at 03:13
  • Just use `complete.cases` on the columns of the dataframe that you are concerned about. Also there's a difference between `NA` and `""` which might be what you meant by "blank". If you do not use dput to deliver a reproducible example there will remain ambiguity about what your data-object really has for its structure and values. – IRTFM Feb 20 '22 at 06:19

0 Answers0