The problem:
I have a json file with 20000 lines, which are basically web logs each representing specific users activities. I want to create a data frame
in R to work with this data. Here is an example of a json line (random):
{"_type":"verifiedProductDetail","ts":1431820984214,"did":"7cd80696-4ede-49e4-a267-b887e684de32","profileId":"33021589-c159-4ec6-8c22-c0e5d9b600d9","preferenceIds":[],"price":115.0,"itemId":"10645","category":"/Binnenverlichting/Wandlampen","currency":1,"language":1,"name":"Wandlamp Linea 60 aluminium","url":"http://www.shop1.be/pagea/wandlampen.html_be","imageUrl":"http://vhetnevnejk.cloudfont.net/media/catalog/product/cache/7/thumbnail/450x/9df78eab33525dcdehl6e5fb8d27136e95/i/m/image_14583/Wandlamp.jpg","id":"871d275a-c856-4280-9cbd-f163b9f749e7","product":{"_id":"625363f4-0d80-3ff5-b091-174de3f9c9b2","domainId":"7cd80696-4ede-49e4-a267-b887e684de32","created":1427806290512,"updated":1436870460905,"itemId":"10645","prices":{"4":299.99,"1":69.99,"2":69.99,"5":299.99},"ratings":{"4":{"rate":1.0,"count":1,"created":1433447796660,"lan":4},"1":{"rate":0.9,"count":2,"created":1434355924529,"lan":1}},"categories":[{"language":3,"text":" Destockage","created":1427820384334},{"language":2,"text":" Outlet","created":1427883890399},{"language":1,"text":"/Binnenverlichting/Wandlampen","created":1431545171151},{"language":6,"text":" Outlet","created":1427876074772},{"language":4,"text":" Outlet","created":1427901573250},{"language":4,"text":" Beleuchtung nach Raum","created":1427827783211},{"language":11,"text":" Outlet","created":1427809161244}],"names":[{"language":3,"text":"Applique murale Linea 60cm en aluminium","created":1427820384334},{"language":2,"text":"Wall Lamp Linea 60 Aluminium","created":1427826729309},{"language":1,"text":"Wandlamp Linea 60 aluminium","created":1435695901730},{"language":6,"text":"Aplique de pared LINEA 60 aluminio ","created":1427819228360},{"language":11,"text":"Kinkiet Linea 60 aluminium","created":1427806290512},{"language":4,"text":"Wandleuchte Linea 60 Aluminium","created":1436870460905}],"imageUrl":"hhttp://vhetnevnejk.cloudfont.net/media/catalog/product/cache/7/thumbnail/450x/9df78eab335evwnrf5fb8d27136e95/i/m/image_14083/LineaWandlamp.jpg","url":"http://www.lampyiswiatlo.pl/kinkiet-linea.html","overwritePrinciples":{},"sku":"10645","stock":-1},"preferences":[]}
Here is what I did in R:
install.packages("rjson")
library("rjson")
SampleFile <- "filesample.json"
json_data <- fromJSON(paste(readLines(SampleFile), collapse=""))
str(json_data)
summary(json_data)
Finally I read it in R and have extracted variables:
> str(json_data)
List of 18
$ _type : chr "verifiedProductDetail"
$ ts : num 1.43e+12
$ did : chr "7cd80696-4ede-49e4-a267-b887e684de32"
$ profileId : chr "8be1a552-9124-453d-a0aa-7124c99b56c6"
$ preferenceIds: list()
$ price : num 26.9
$ itemId : chr "9858"
$ category : chr ""
$ currency : num 1
$ language : num 6
$ name : chr "up Weiss"
$ profile :List of 13
..$ _id : chr "8be1a552-9124-453d-a0aa-7124c99b56c6"
..$ created : num 1.43e+12
..$ updated : num 1.43e+12
[and others]
My issue: However, as you can see the length is 1 for all my variables, meaning that each variable only takes and represents one value (the first entry on the json file). Other values have disappeared. We can see it better using summary() function.
> summary(json_data)
Length Class Mode
_type 1 -none- character
ts 1 -none- numeric
did 1 -none- character
profileId 1 -none- character
preferenceIds 0 -none- list
price 1 -none- numeric
itemId 1 -none- character
category 1 -none- character
currency 1 -none- numeric
language 1 -none- numeric
name 1 -none- character
url 1 -none- character
imageUrl 1 -none- character
id 1 -none- character
profile 13 -none- list
product 14 -none- list
group 10 -none- list
preferences 0 -none- list
Summary: Could you please give to me any advice on what is wrong with my code that makes it only get the first value of each variable and all others have disappeared?