3

Why do i keep getting an error when i try to create a table in R and then export that file to excel. i am getting an error:

Error in write.table(data1, "data1.csv", col.names = NA, sep = ",", dec = ".", : unimplemented type 'list' in 'EncodeElement'

I am aware i will need to flatten the list, but i am not sure how i would flatten it.

My Code:

library(httr)
library(jsonlite)
library(xml2)
library("rio")    
library("magrittr")

query <- "http://api.erg.kcl.ac.uk/AirQuality/Information/MonitoringSiteSpecies/GroupName=London/Json"

out <- GET(url=query)
http_status(out)
data1 <- content(out)

data1 <- fromJSON( "http://api.erg.kcl.ac.uk/AirQuality/Information/MonitoringSiteSpecies/GroupName=London/Json")
data1 <- as.data.frame(data1)


write.table(data1, file="data1.csv", row.names=FALSE, na="", col.names = FALSE, sep=",")

View(data1)

setwd("F:/")    
write.csv(data1,'data1.csv')
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • What is `class(data1)`? And what does `sapply(class, data1)` return? Sounds like you either don't have a data.frame or your data.frame contains a column type that cannot be written as a CSV. When asking for help, you should include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input so we can test to see what's going on. Clearly describe what you want the desired output to look like. – MrFlick Feb 09 '18 at 15:35
  • I have edited the question. I think the column type is a list. And therefore i keep getting the error. Would flattening work? or would i need to separate the list first? –  Feb 09 '18 at 15:46
  • IF you run the api address in your firefox browser you will be able to see that "species" has a list. –  Feb 09 '18 at 15:49
  • That data doesn't look anything like something you could easily put in a CSV file. You need to be more precise about what exactly you need the output to look like. Do you really need a CSV? What do you plan to do with the CSV later? Do you maybe just want to save in an R binary format? – MrFlick Feb 09 '18 at 15:49
  • I need it in a csv file as a back up. I will be visualising the data –  Feb 09 '18 at 16:01

2 Answers2

4

Try this:

data1<-data.frame(lapply(data1, as.character), stringsAsFactors=FALSE)

write.table(data1, file="data1.csv", row.names=FALSE, na="", col.names = FALSE, sep=",")
Terru_theTerror
  • 4,918
  • 2
  • 20
  • 39
  • Oh thankyou, this sort of worked. without errors however, the data is very messy. Is it not possible to split the list into coloumns? –  Feb 09 '18 at 16:13
  • You can use sep="|" and than open the csv with for example excel and format the data going to the Data tab and select text to columns under data tools then select delimited "Other":"|" – Terru_theTerror Feb 09 '18 at 16:21
  • write.table(data1, file="data1.csv", row.names=FALSE, na="", col.names = FALSE, sep="|") – Terru_theTerror Feb 09 '18 at 16:24
  • Here I to format csv data in excel, this could be helpful (in you case you have to use delimited Other: "|" https://superuser.com/questions/407082/easiest-way-to-open-csv-with-commas-in-excel – Terru_theTerror Feb 09 '18 at 16:31
1

Overview

Converting the content of the httr::GET() request from into an object required a few data manipulation steps prior to you being able to export as a file.

Reproducible Example

Initial httr::GET() returns multiple objects

# install necessary packages
install.packages( pkgs = c( "httr", "jsonlite", "magrittr" ))

# load necessary packages
library( httr )
library( jsonlite )
library( magrittr )

# store query
query <- "http://api.erg.kcl.ac.uk/AirQuality/Information/MonitoringSiteSpecies/GroupName=London/Json"

# GET the query
out <- httr::GET( url = query )

# base method
# Convert content from raw bytes to character
contents.out.base <- base::rawToChar( x = out$content )

# examine the first 30 characters
# from the contents in JSON form
base::substr( x = contents.out.base
              , start = 0
              , stop = 30
              )
# [1] "{\"Sites\":{\"Site\":[{\"@LocalAuth"

# transfrom from JSON string
# into a data frame
# set 'flatten' equal to TRUE
# to break out lists into individual columns
contents.out.base.df <- 
  jsonlite::fromJSON( txt = contents.out.base
                      , flatten = TRUE
  )

# view the data
class( contents.out.base.df ) # [1] "list"

# Interesting! It didn't return a data frame
names( contents.out.base.df )       # [1] "Sites"
names( contents.out.base.df$Sites ) # [1] "Site"
names( contents.out.base.df$Sites$Site )
# [1] "@LocalAuthorityCode" "@LocalAuthorityName" "@SiteCode"          
# [4] "@SiteName"           "@SiteType"           "@DateClosed"        
# [7] "@DateOpened"         "@Latitude"           "@Longitude"         
# [10] "@LatitudeWGS84"      "@LongitudeWGS84"     "@DataOwner"         
# [13] "@DataManager"        "@SiteLink"           "Species" 

# Note that 'Species' doesn't contain an '@' in front of it
# Why?
lapply( X = contents.out.base.df$Sites$Site, FUN = class )
# $`@LocalAuthorityCode`
# [1] "character"
# 
# $`@LocalAuthorityName`
# [1] "character"
# 
# $`@SiteCode`
# [1] "character"
# 
# $`@SiteName`
# [1] "character"
# 
# $`@SiteType`
# [1] "character"
# 
# $`@DateClosed`
# [1] "character"
# 
# $`@DateOpened`
# [1] "character"
# 
# $`@Latitude`
# [1] "character"
# 
# $`@Longitude`
# [1] "character"
# 
# $`@LatitudeWGS84`
# [1] "character"
# 
# $`@LongitudeWGS84`
# [1] "character"
# 
# $`@DataOwner`
# [1] "character"
# 
# $`@DataManager`
# [1] "character"
# 
# $`@SiteLink`
# [1] "character"
# 
# $Species
# [1] "list"

# save contents.out.base.df$Sites$Site as its own data frame
# without $Species
website.df <-
  contents.out.base.df$Sites$Site[
    , which( colnames( contents.out.base.df$Sites$Site ) != "Species" )
     ]

# check dim
dim( website.df ) # [1] 212  14

# view the first six rows
head( x = website.df )
#' @LocalAuthorityCode  @LocalAuthorityName @SiteCode
#' 1                   1 Barking and Dagenham       BG3
#' 2                   1 Barking and Dagenham       BG1
#' 3                   1 Barking and Dagenham       BG2
#' 4                   2               Barnet       BN2
#' 5                   2               Barnet       BN3
#' 6                   2               Barnet       BN1
#' @SiteName        @SiteType
#' 1   Barking and Dagenham - North Street         Kerbside
#' 2     Barking and Dagenham - Rush Green         Suburban
#' 3 Barking and Dagenham - Scrattons Farm         Suburban
#' 4                     Barnet - Finchley Urban Background
#' 5             Barnet - Strawberry Vale  Urban Background
#' 6              Barnet - Tally Ho Corner         Kerbside
#' @DateClosed         @DateOpened        @Latitude
#' 1 2011-05-25 00:00:00 2007-03-16 00:00:00        51.540444
#' 2                     1999-11-02 00:00:00        51.563752
#' 3                     1999-10-17 00:00:00        51.529389
#' 4 2012-04-20 00:00:00 2000-08-09 13:00:00        51.591901
#' 5 2002-05-15 00:00:00 2000-08-14 14:00:00 51.6008848453589
#' 6 2012-04-20 00:00:00 1998-12-20 12:00:00        51.614675
#' @Longitude @LatitudeWGS84 @LongitudeWGS84
#' 1           0.074418   6717454.5833   8284.17386585
#' 2           0.177891  6721627.34498   19802.7355367
#' 3           0.132857  6715476.18683   14789.5735883
#' 4          -0.205992  6726669.62886  -22930.9245475
#' 5 -0.172297542087178  6728279.54795  -19180.0746501
#' 6          -0.176607  6730751.38494  -19659.8013105
#' @DataOwner          @DataManager
#' 1 Barking and Dagenham King's College London
#' 2 Barking and Dagenham King's College London
#' 3 Barking and Dagenham King's College London
#' 4               Barnet King's College London
#' 5               Barnet King's College London
#' 6               Barnet King's College London
#' @SiteLink
#' 1 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG3
#' 2 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG1
#' 3 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG2
#' 4 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BN2
#' 5 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BN3
#' 6 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BN1

One to Many Relationship Requires website.df to be reshaped into a Long Data Frame

TL;DR the final data frame will have more than 212 rows.

One local authority, per an individual site, is able to identify more than one species.

This one-to-many relationship results in the final version of website.df to be reshaped into 'long' format, where one local authority's information may repeat due to them identifying more than one type of species at a site.

To combine each object within contents.out.base.df$Site$Sites$Species with its corresponding row in website.df, I used a counter object. Prior to the use of lapply(), counter is set to a value of zero.

The use of <<- - the super assignment operator - allowed me to extract the corresponding website.df row each time a new object within contents.out.base.df$Site$Sites$Species was called. Reading Using a counter inside an apply structured loop in R was very helpful in learning how to do this properly.

Note: Using cbind() to combine the objects within contents.out.base.df$Site$Sites$Species results in several warnings(). The SO post cbind warnings : row names were found from a short variable and have been discarded reveals that cbind() resulted in repeating row names. To prevent duplicate row names, it disregarded them.

# create counter
# and set its value to zero
counter <- 0

# construct the column binding
# and replace the objects within the list
# with the concated version of 
# that particular object's row in website.df
contents.out.base.df$Sites$Site$Species <-
  lapply( X = contents.out.base.df$Sites$Site$Species
          , FUN = function( i ){
            # add to counter
            counter <<- counter + 1

            # add columns from 
            # the counter row in website.df
            # onto the i object in X
            cbind(
              website.df[ counter , ]
              , i
              , stringsAsFactors = FALSE
            )
          } # end of anonymous function
  )

# There were 50 or more warnings (use warnings() to see the first 50)
warnings()
# Warning messages:
#   1: In data.frame(..., check.names = FALSE) :
#   row names were found from a short variable and have been discarded

Export Collapsed List as CSV

After collapsing the objects in contents.out.base.df$Site$Sites$Species into one data frame, I cleaned up the row and column names of website.df. Finally, website.df is ready to be exported into your working directory using the write.csv() function.

# collapse the individual objects
# in the list into one data frame
website.df <-
  data.frame( 
    do.call( what = rbind
             , args = contents.out.base.df$Sites$Site$Species
             )
    , stringsAsFactors = FALSE
    )

# check dim
dim( website.df ) # [1] 524  18

# rename the rows
rownames( x = website.df ) <-
  as.character( x = 1:nrow( x = website.df ) )

# Make syntactically valid column names
colnames( x = website.df ) <-
  base::gsub( pattern = "X."
              , replacement = ""
              , x = colnames( website.df )
  )

# view the first six rows
head( x = website.df )
# LocalAuthorityCode   LocalAuthorityName SiteCode
# 1                  1 Barking and Dagenham      BG3
# 2                  1 Barking and Dagenham      BG1
# 3                  1 Barking and Dagenham      BG1
# 4                  1 Barking and Dagenham      BG2
# 5                  1 Barking and Dagenham      BG2
# 6                  2               Barnet      BN2
# SiteName         SiteType
# 1   Barking and Dagenham - North Street         Kerbside
# 2     Barking and Dagenham - Rush Green         Suburban
# 3     Barking and Dagenham - Rush Green         Suburban
# 4 Barking and Dagenham - Scrattons Farm         Suburban
# 5 Barking and Dagenham - Scrattons Farm         Suburban
# 6                     Barnet - Finchley Urban Background
# DateClosed          DateOpened  Latitude Longitude
# 1 2011-05-25 00:00:00 2007-03-16 00:00:00 51.540444  0.074418
# 2                     1999-11-02 00:00:00 51.563752  0.177891
# 3                     1999-11-02 00:00:00 51.563752  0.177891
# 4                     1999-10-17 00:00:00 51.529389  0.132857
# 5                     1999-10-17 00:00:00 51.529389  0.132857
# 6 2012-04-20 00:00:00 2000-08-09 13:00:00 51.591901 -0.205992
# LatitudeWGS84 LongitudeWGS84            DataOwner
# 1  6717454.5833  8284.17386585 Barking and Dagenham
# 2 6721627.34498  19802.7355367 Barking and Dagenham
# 3 6721627.34498  19802.7355367 Barking and Dagenham
# 4 6715476.18683  14789.5735883 Barking and Dagenham
# 5 6715476.18683  14789.5735883 Barking and Dagenham
# 6 6726669.62886 -22930.9245475               Barnet
# DataManager
# 1 King's College London
# 2 King's College London
# 3 King's College London
# 4 King's College London
# 5 King's College London
# 6 King's College London
# SiteLink
# 1 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG3
# 2 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG1
# 3 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG1
# 4 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG2
# 5 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BG2
# 6 http://www.londonair.org.uk/london/asp/publicdetails.asp?site=BN2
# SpeciesCode SpeciesDescription DateMeasurementStarted
# 1         NO2   Nitrogen Dioxide    2008-01-01 00:00:00
# 2         NO2   Nitrogen Dioxide    2008-01-01 00:00:00
# 3         SO2    Sulphur Dioxide    1999-10-23 00:00:00
# 4         NO2   Nitrogen Dioxide    2007-11-21 00:00:00
# 5        PM10   PM10 Particulate    1999-10-17 00:00:00
# 6         NO2   Nitrogen Dioxide    2008-01-01 00:00:00
# DateMeasurementFinished
# 1     2011-05-25 00:00:00
# 2                        
# 3                        
# 4                        
# 5                        
# 6     2012-04-20 00:00:00

# Export as CSV
write.csv( x = website.df
           , file = "web_scrape.csv"
           , row.names = FALSE
)

# end of script #

Session Info

Using sessionInfo().

R version 3.4.3 (2017-11-30)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.2

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] magrittr_1.5 jsonlite_1.5 httr_1.3.1  

loaded via a namespace (and not attached):
[1] compiler_3.4.3  R6_2.2.2        rgdal_1.2-16    tools_3.4.3    
[5] sp_1.2-7        curl_3.1        yaml_2.1.16     grid_3.4.3     
[9] lattice_0.20-35
Cristian E. Nuno
  • 2,822
  • 2
  • 19
  • 33
  • after using the 'Write csv' lines of code, why cant i find my csv in the directory? –  Feb 15 '18 at 12:59
  • Try using `list.files()` to examine the files contained your *current* working directory. The `file` argument in `write.csv()` relies on you knowing what file path your working directory is located in. – Cristian E. Nuno Feb 15 '18 at 13:26
  • The [`file`](https://stat.ethz.ch/R-manual/R-devel/library/utils/html/write.table.html) argument. `file = "/somefolder/somename.csv"`. Alternatively, see [`setwd()`](https://stat.ethz.ch/R-manual/R-devel/library/base/html/getwd.html). – Cristian E. Nuno Feb 15 '18 at 14:17
  • I have tried to do the same thing again, but this time querying : http://api.erg.kcl.ac.uk/AirQuality/Daily/MonitoringIndex/Latest/GroupName=London/JSON - And for some reason it does not show me all the coloumns. –  Feb 19 '18 at 17:05
  • @R07 I would recommend creating a new question showing what you've done and specifying where you are stuck. – Cristian E. Nuno Feb 19 '18 at 17:54
  • i have created a new question regarding this. if you could please help that would be great –  Feb 20 '18 at 11:48
  • @R07, it seems this new API requires more data manipulation steps than the one you posted in this question. I'll need some time to figure out how to tackle it. Would you mind if I reposted my code for discussion at [Code Review Stack Exchange](https://codereview.stackexchange.com/)? I think we'll both benefit from having others give their feedback on how to make the code more efficient. – Cristian E. Nuno Feb 23 '18 at 03:28
  • Yes. That is absolutely fine. –  Feb 23 '18 at 17:01
  • Wouldnt it be something similar that was carried out for this api? There seems to be a list but an extra coloumn with the new api –  Feb 23 '18 at 17:14
  • Definitely something similar as far as reshaping the data to long format. There are two list elements of interest, one for the sites managed by each local authority and one for the species identified at each site. The problem for me is unstacking that information for each local authority, so that our final data frame grows from 33 rows to X rows, one row for each species identified at each site for each local authority. – Cristian E. Nuno Feb 23 '18 at 17:26