R/PowerBI API question: How to assign NA for the missing columns in the resultant data frame when the structure of source data frame is not consistent

Question

I am working on parsing data from PowerBI REST API for activity data. The way this API functions is - the same end point may return data with 10 fields today and tomorrow it may return with 15 fields. My goal is to run a scheduled process that would extract daily data (say into a SQL table). I predefined an output data frame with the columns that I need. But I am looking for a way to handle the case when - say I defined 12 columns in my output data frame and in today's REST API extract, the results did not contain 1 of those 12 columns. I would like to source them as NA (or an empty string). How to do that in R? Below is a block of code I am working with:

response<-httr::GET(url=RESTEndPoint,config=httpHeader)
parsedResp<-httr::content(response, "text",encoding = "UTF- 
8")%>%jsonlite::fromJSON(flatten = TRUE)
  
df<-as.data.frame(parsedResp$activityEventEntities)
outputDF<-df %>%
  dplyr::select(
    LogID='Id'
    ,CreationTimeD='CreationTime'
    ,Operation='Operation'
    ,OrganizationID='OrganizationId'
    
  )

Say if the field 'Operation' is missing from the parsed response, then this would throw the error: 'Error: Can't subset columns that don't exist.' since that's how dplyr:Select works. Is there a way to say, when the field 'Operation' is missing in the parsed response, assign it as NA and move on with the next iteration of the loop

Thank you!

Thanks for the response! That might work too but I was able to address this using bind_rows() by predefining an empty data frame with the columns I would like and then appending the rows of the extract in loop. At the end, I can do a dplyr::select to pick the columns I need and rename them. — Jade, Dec 30 '20 at 07:19

score 0 · Answer 1 · answered Dec 30 '20 at 09:02

Not sure if this is a solution to your problem without seeing the data. Delivery_ are examples of deliveries. You can also create an empty data frame with your desired columns needed called "data_delivery_cools_needed" and deleted it later on.

library(data.table)
library(tidyverse)

coos_needed <- c('a', 'b', 'c')

delivery_1 <- data.frame(a = 1, b = 2, x = 3, z = 4)
delivery_2 <- data.frame(c = 1)
delivery_3 <- data.frame(a = 1, b = 2, c = 3)
delivery_4 <- data.frame(f = 5)

# Create a list of all deliveries

all_deliveries <- mget(ls(pattern = "^delivery_"))

# Combine everything into one - fill = TRUE

all_deliveries_data_frame <- rbindlist(all_deliveries, fill = TRUE, idcol = "delivery_file")

final_data <- all_deliveries_data_frame %>% select(cols_needed)

R/PowerBI API question: How to assign NA for the missing columns in the resultant data frame when the structure of source data frame is not consistent

1 Answers1