1

I am working on parsing data from PowerBI REST API for activity data. The way this API functions is - the same end point may return data with 10 fields today and tomorrow it may return with 15 fields. My goal is to run a scheduled process that would extract daily data (say into a SQL table). I predefined an output data frame with the columns that I need. But I am looking for a way to handle the case when - say I defined 12 columns in my output data frame and in today's REST API extract, the results did not contain 1 of those 12 columns. I would like to source them as NA (or an empty string). How to do that in R? Below is a block of code I am working with:

response<-httr::GET(url=RESTEndPoint,config=httpHeader)
parsedResp<-httr::content(response, "text",encoding = "UTF- 
8")%>%jsonlite::fromJSON(flatten = TRUE)
  
df<-as.data.frame(parsedResp$activityEventEntities)
outputDF<-df %>%
  dplyr::select(
    LogID='Id'
    ,CreationTimeD='CreationTime'
    ,Operation='Operation'
    ,OrganizationID='OrganizationId'
    
  )

Say if the field 'Operation' is missing from the parsed response, then this would throw the error: 'Error: Can't subset columns that don't exist.' since that's how dplyr:Select works. Is there a way to say, when the field 'Operation' is missing in the parsed response, assign it as NA and move on with the next iteration of the loop

Thank you!

Phil
  • 7,287
  • 3
  • 36
  • 66
Jade
  • 56
  • 5
  • Thanks for the response! That might work too but I was able to address this using bind_rows() by predefining an empty data frame with the columns I would like and then appending the rows of the extract in loop. At the end, I can do a dplyr::select to pick the columns I need and rename them. – Jade Dec 30 '20 at 07:19

1 Answers1

0

Not sure if this is a solution to your problem without seeing the data. Delivery_ are examples of deliveries. You can also create an empty data frame with your desired columns needed called "data_delivery_cools_needed" and deleted it later on.

library(data.table)
library(tidyverse)

coos_needed <- c('a', 'b', 'c')

delivery_1 <- data.frame(a = 1, b = 2, x = 3, z = 4)
delivery_2 <- data.frame(c = 1)
delivery_3 <- data.frame(a = 1, b = 2, c = 3)
delivery_4 <- data.frame(f = 5)

# Create a list of all deliveries

all_deliveries <- mget(ls(pattern = "^delivery_"))

# Combine everything into one - fill = TRUE

all_deliveries_data_frame <- rbindlist(all_deliveries, fill = TRUE, idcol = "delivery_file")

final_data <- all_deliveries_data_frame %>% select(cols_needed)
MLEN
  • 2,162
  • 2
  • 20
  • 36