0

Please go through the archival data USA GOV Sample Data

Now I want to read this file in R then getting below mentioned error

result = fromJSON(textFileName)
Error in fromJSON(textFileName) : unexpected character 'u'

When I want to read it in Python then getting below mentioned error

import json 
records = [json.loads(line) for line in open(path)]

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
codecs.charmap_decode(input,self.errors,decoding_table)[0]
     24 
     25 class StreamWriter(Codec,codecs.StreamWriter):

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 4088: character maps to <undefined>

can someone please help me that how can I read this kind of files.

Vineet
  • 1,492
  • 4
  • 17
  • 31

2 Answers2

0

I couldn't get the codes OP provided on the question on my system too(windows/Rstudio/Jupyter). I dig around and find this for R, adapting it to this case:

library(jsonlite)
out <- lapply(readLines("usagov_bitly_data2013-05-17-1368817803"), fromJSON)
df<-data.frame(Reduce(rbind, out))

Although the error I got in R is curiously different from yours.

result = fromJSON("usagov_bitly_data2013-05-17-1368817803")
#Error in parse_con(txt, bigint_as_char) : parse error: trailing garbage
#           [ 34.730400, -86.586098 ] } { "a": "Mozilla\/5.0 (Windows N
#                     (right here) ------^

For Python, as mentioned by juanpa, it seems to be a matter of encoding. The following code works for me.

import json 
import os
path=os.path.abspath("usagov_bitly_data2013-05-17-1368817803")
print(path)
file = open(path, encoding="utf8")
records = [json.loads(line) for line in file]
MingH
  • 171
  • 1
  • 2
0

Solution in R:

library(jsonlite)

# if you have a local file
conn <- gzcon(file("usagov_bitly_data2013-05-17-1368817803.gz", "rb"))
# if you read it from URL
conn <- gzcon(url("http://1usagov.measuredvoice.com/bitly_archive/usagov_bitly_data2013-05-17-1368817803.gz"))

data <- stream_in(conn)
Melkor.cz
  • 1,977
  • 17
  • 15