Exclude NULL values from SQL query and retain old values in dataframe

Question

I would like to dynamically perform an SQL query (e.g. every day) and store the data into a reactive dataframe within a Shiny app. However, if the returned query contains NULL values or Zeros, I would like to retain the old values instead of it being replaced by the latest queried NULL or 0s.

For example:

Initial dataframe:

date       a b c d
2019-01-01 1 2 3 4
2019-01-02 2 3 4 5

Returned query stored as a dataframe:

date       a    b c d
2019-01-03 NA 4 3 0

After SQL Query is returned and binded into the dataframe using dplyr bind_rows, the final dataframe should look like this:

date       a b c d
2019-01-01 1 2 3 4
2019-01-02 2 3 4 5
2019-01-03 2 4 3 5   # 2 retained instead of NULL, 5 retained instead of 0

I understand that I probably have to write a conditional statement like the following pseudo code:

if (is.null(returned_query) & (returned_query != 0)) {
   // some code to replace value with the most recent value
} else {
  // proceed with bind_rows()
}

Unfortunately, I have no control over the SQL queries, and I have to do this in R. Anyone has a solution to this?

Wouldn't a `COALESCE` in your sql query dof the trick, would it? — DDS, Feb 12 '19 at 08:47
@DDS Unfortunately, I don't have direct control over the SQL query, and I have to do this in R... — ZPeh, Feb 12 '19 at 09:19

Knio · Answer 1 · 2019-02-13T15:09:53.687

0

You can use "which" to identify the indices for your desired criteria of NULL/NA/0 .

You can find out more on ?which

So for your question, a solution could be:

Setup test data


DT_old <- data.frame(date=c("2019-01-01","2019-01-02"),a=1:2,b=2:3,c=3:4,d=4:5, stringsAsFactors = FALSE)

# New Data from SQL query

DT_new <- data.frame(date="2019-01-03",a=0,b=2,c=3,d=4, stringsAsFactors = FALSE)

Bind new data to old one

DT3 <- rbind(DT_old, DT_new) # or use your dplyr logic to row bind old and new datasets

Calculate indices for your criteria and substitute with prev row in the bound dataset

# find out indices of the bound dataframe which meet your criteria
n_idx <- which((DT3 ==0)|is.na(DT3)|is.null(DT3), arr.ind=TRUE)

# substitute said indices with values from previous row

DT3[nrow(DT3),n_idx[,2]] <- DT3[nrow(DT3)-1,n_idx[,2]]

Please note that this assumes that only one row is added from each query.

edited Feb 13 '19 at 15:09

answered Feb 12 '19 at 11:07

Knio

13
4

Edited with some explanations. Thanks! – Knio Feb 12 '19 at 12:34
@Knio I've tried your solution, but it doesn't answer my question fully. Since in the new dataframe, a = 0, the final row should be (2,2,3,4) as a = 0 is ignored while b, c and d takes on the new queried values. It should not be a simple copying of the previous row (2,3,4,5)... – ZPeh Feb 13 '19 at 06:29
I misunderstood your question. Edited the solution accordingly – Knio Feb 13 '19 at 15:10

Exclude NULL values from SQL query and retain old values in dataframe

1 Answers1