0

I'm attempting to convert a .json file to a .csv so that I can perform analysis in R. I'm following steps suggested by others, but still run into problems (likely due to the large size of the json file). First, I pull the URL from the web:

import urllib

#first open html as .json
response = urllib.request.urlopen("http://trends.vera.org/data/county/1003/")
input = response.read()
print(input)

This function below I got from the linked question to flatten the json file.

#function to flatten .json file
def flattenjson( b, delim ):
    val = {}
    for i in b.keys():
        if isinstance( b[i], dict ):
            get = flattenjson( b[i], delim )
            for j in get.keys():
                val[ i + delim + j ] = get[j]
        else:
            val[i] = b[i]

    return val

The lines below take in a list and generate column names for the csv. This is where something is going wrong. Does anyone know how to fix this?

#find column names
input = map( lambda x: flattenjson( x ), input )
columns = map( lambda x: x.keys(), input )
columns = reduce( lambda x,y: x+y, columns )
columns = list( set( columns ) )
print(columns)

Finally, I write the json data to a .csv file.

#write to .csv file
with open( fname, 'wb' ) as out_file:
    csv_w = csv.writer( out_file )
    csv_w.writerow( columns )

    for i_r in input:
        csv_w.writerow( map( lambda x: i_r.get( x, "" ), columns ) )

Thanks in advance for any help.

hjohns12
  • 11
  • 5

2 Answers2

0

First of all you need decode response. Always use requests library for http requests. It can decode json.

import requests
response = requests.get("http://trends.vera.org/data/county/1003/")
data = response.json()

Your second part has another error. flattenjson requires 2 agruments and you provide only one. Second is delimiter in CSV file. This code works:

print(flattenjson(data, ';'))

If you don't need all data you can specify exact key:

flattenjson(data['yearlyData'], ';').
Raz
  • 7,508
  • 4
  • 27
  • 25
0

Doing this in R turned out to be far easier. Only one item of that list had tabular data, all of which was numeric. But it also had some funny formatting, thus the need for the grab_column() function. Result contains the data in tabular format.

library(rjson)    

tmp <- rjson::fromJSON(file = "http://trends.vera.org/data/county/1003/") 

grab_column <- function(x) {
  tmp <- as.character(x)
  if (length(tmp) == 0) tmp <- NA
  else tmp[tmp == "NULL"] <- NA
  as.numeric(tmp)
}

Result <- as.data.frame(lapply(foo, FUN = grab_column))
Year <- data.frame(year = as.numeric(names(foo[[1]])))
Result <- cbind(Year, Result)
hjohns12
  • 11
  • 5