0

I have a large string of characters and would like to extract certain information from it matching pattern:

str(input) chr [1:109094] "{'asin': '0981850006', 'description': 'Steven Raichlen\'s Best of Barbecue Primal Grill DVD. The first three volumes of the si"| truncated ...

I get the following content of input[1] - description of product meta

[1] ("{'asin': '144072007X', 'related': {'also_viewed': ['B008WC0X0A', 'B000CPMOVG', 'B0046641AE', 'B00J150GAO', 'B00005AMCG', 'B005WGX97I'], 
         'bought_together': ['B000H85WSA']}, 
         'title': 'Sand Shark Margare Maron Audio CD', 
         'price': 577.15, 
         'salesRank': {'Patio, Lawn & Garden': 188289}, 
         'imUrl': 'http://ecx.images-amazon.com/images/I/31B9X0S6dqL._SX300_.jpg', 
         'brand': 'Tesoro', 
         'categories': [['Patio, Lawn & Garden', 'Lawn Mowers & Outdoor Power Tools', 'Metal Detectors']], 
'description': \"The Tesoro Sand Shark metal combines time-proven PI circuits with the latest digital technology creating the first.\"}") 

Now I would like to iterate over each element of the large string and extract asin, title, price, salesRank, brand and categories that should be saved in a data.frame for better handling.

The data is originally from a JSON file as you might notice. I tried to import it using stream_in command, but it didn't help. So just imported it using readLines. Please please help! Being a bit desperate...Any hint is appreciated!

The jsonlite package shows the following problem:

lexical error: invalid char in json text.
                                      {'asin': '0981850006', 'descript
                     (right here) ------^
closing fileconnectionoldClass input connection.

Any new ideas on that? Given lots of unanswered questions on that issue, must be very relevant for newbies ;)

vanja_65
  • 101
  • 2
  • 11
  • Have you considered using `jsonlite` package? – akrun Mar 20 '16 at 16:14
  • yep :) and experience the problem similar to this http://stackoverflow.com/questions/32158366/import-json-file-in-r-and-further-process-it-as-table?lq=1 – vanja_65 Mar 20 '16 at 16:16
  • But none of the suggestions there really helps...:( – vanja_65 Mar 20 '16 at 16:17
  • 1
    I have the same problem with parsing JSON in JavaScript. JSON requires `"`, `'` are not allowed. That is why you are getting your invalid character error, use `"` instead. – Kaspar Lee Mar 20 '16 at 16:54
  • Hi, Druzion. Thanks for you comment. Can you please be more specific - how should I specify the command? – vanja_65 Mar 20 '16 at 16:58
  • 2
    Your JSON test is not valid. all keys should be wrapped with `"` double quotes instead of single quote. Once you fix JSON content, jsonlite will happily parse it – Saleem Mar 20 '16 at 17:00
  • Hi Saleem, thanks! how can I achieve it in r while importing the file? – vanja_65 Mar 20 '16 at 17:13
  • 1
    `the_fct <- function(what) gsub(sprintf('\'%s\'\\:\\s*(.*?),\\s*\'\\w+\'\\:|.', what), '\\1', gsub('\\n', '', input[1]), perl = TRUE); what <- c('asin', 'title', 'price', 'salesRank', 'brand' , 'categories'); lapply(what, the_fct)` – rawr Mar 20 '16 at 17:31
  • Hi rawr, seems to work for extracting content from char. Thanks! – vanja_65 Mar 20 '16 at 17:40

0 Answers0