0

The problem: I have a json file with 20000 lines, which are basically web logs each representing specific users activities. I want to create a data frame in R to work with this data. Here is an example of a json line (random):

    {"_type":"verifiedProductDetail","ts":1431820984214,"did":"7cd80696-4ede-49e4-a267-b887e684de32","profileId":"33021589-c159-4ec6-8c22-c0e5d9b600d9","preferenceIds":[],"price":115.0,"itemId":"10645","category":"/Binnenverlichting/Wandlampen","currency":1,"language":1,"name":"Wandlamp Linea 60 aluminium","url":"http://www.shop1.be/pagea/wandlampen.html_be","imageUrl":"http://vhetnevnejk.cloudfont.net/media/catalog/product/cache/7/thumbnail/450x/9df78eab33525dcdehl6e5fb8d27136e95/i/m/image_14583/Wandlamp.jpg","id":"871d275a-c856-4280-9cbd-f163b9f749e7","product":{"_id":"625363f4-0d80-3ff5-b091-174de3f9c9b2","domainId":"7cd80696-4ede-49e4-a267-b887e684de32","created":1427806290512,"updated":1436870460905,"itemId":"10645","prices":{"4":299.99,"1":69.99,"2":69.99,"5":299.99},"ratings":{"4":{"rate":1.0,"count":1,"created":1433447796660,"lan":4},"1":{"rate":0.9,"count":2,"created":1434355924529,"lan":1}},"categories":[{"language":3,"text":" Destockage","created":1427820384334},{"language":2,"text":" Outlet","created":1427883890399},{"language":1,"text":"/Binnenverlichting/Wandlampen","created":1431545171151},{"language":6,"text":" Outlet","created":1427876074772},{"language":4,"text":" Outlet","created":1427901573250},{"language":4,"text":" Beleuchtung nach Raum","created":1427827783211},{"language":11,"text":" Outlet","created":1427809161244}],"names":[{"language":3,"text":"Applique murale Linea 60cm en aluminium","created":1427820384334},{"language":2,"text":"Wall Lamp Linea 60 Aluminium","created":1427826729309},{"language":1,"text":"Wandlamp Linea 60 aluminium","created":1435695901730},{"language":6,"text":"Aplique de pared LINEA 60 aluminio ","created":1427819228360},{"language":11,"text":"Kinkiet Linea 60 aluminium","created":1427806290512},{"language":4,"text":"Wandleuchte Linea 60 Aluminium","created":1436870460905}],"imageUrl":"hhttp://vhetnevnejk.cloudfont.net/media/catalog/product/cache/7/thumbnail/450x/9df78eab335evwnrf5fb8d27136e95/i/m/image_14083/LineaWandlamp.jpg","url":"http://www.lampyiswiatlo.pl/kinkiet-linea.html","overwritePrinciples":{},"sku":"10645","stock":-1},"preferences":[]}    

Here is what I did in R:

     install.packages("rjson")     
     library("rjson")
     SampleFile <- "filesample.json"    
     json_data <- fromJSON(paste(readLines(SampleFile), collapse=""))    
     str(json_data)    
     summary(json_data)        

Finally I read it in R and have extracted variables:

    > str(json_data)
    List of 18
     $ _type        : chr "verifiedProductDetail"
     $ ts           : num 1.43e+12
     $ did          : chr "7cd80696-4ede-49e4-a267-b887e684de32"
     $ profileId    : chr "8be1a552-9124-453d-a0aa-7124c99b56c6"
     $ preferenceIds: list()
     $ price        : num 26.9
     $ itemId       : chr "9858"
     $ category     : chr ""
     $ currency     : num 1
     $ language     : num 6
     $ name         : chr "up Weiss"
     $ profile      :List of 13
       ..$ _id          : chr "8be1a552-9124-453d-a0aa-7124c99b56c6"
       ..$ created      : num 1.43e+12
       ..$ updated      : num 1.43e+12

[and others]

My issue: However, as you can see the length is 1 for all my variables, meaning that each variable only takes and represents one value (the first entry on the json file). Other values have disappeared. We can see it better using summary() function.

     > summary(json_data)
                   Length Class  Mode     
     _type          1     -none- character
     ts             1     -none- numeric  
     did            1     -none- character
     profileId      1     -none- character
     preferenceIds  0     -none- list     
     price          1     -none- numeric  
     itemId         1     -none- character
     category       1     -none- character
     currency       1     -none- numeric  
     language       1     -none- numeric  
     name           1     -none- character
     url            1     -none- character
     imageUrl       1     -none- character
     id             1     -none- character
     profile       13     -none- list     
     product       14     -none- list     
     group         10     -none- list     
     preferences    0     -none- list  

Summary: Could you please give to me any advice on what is wrong with my code that makes it only get the first value of each variable and all others have disappeared?

  • I don't exactly see your problem, can you specify your expected output/result? Maybe you need to provide a minimal example of your dataset, see http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example/5963610#5963610 for others being able to reproduce your problem - this will increase your chances of getting help – mts Jul 21 '15 at 17:55
  • Thank you for your reply. The expected output contains many values as there are 20000 in the json file. The output I got is only one value for one variable. The lenght of each variable should be 20000. – Junior Data Scientist Jul 22 '15 at 05:52

0 Answers0