I have a column "place" in my table which contains data about a place that looks like:
{ "id" : "94965b2c45386f87", "name" : "New York", "boundingBoxCoordinates" : [ [ { "longitude" : -79.76259, "latitude" : 40.477383 }, { "longitude" : -79.76259, "latitude" : 45.015851 }, { "longitude" : -71.777492, "latitude" : 45.015851 }, { "longitude" : -71.777492, "latitude" : 40.477383 } ] ], "countryCode" : "US", "fullName" : "New York, USA", "boundingBoxType" : "Polygon", "URL" : "https://api.twitter.com/1.1/geo/id/94965b2c45386f87.json", "accessLevel" : 0, "placeType" : "admin", "country" : "United States" }
From this, I want to extract the country name. I have tried the following code:
loc <- t1$place
loc = gsub('"', '', loc)
loc = gsub(',', '', loc)
to clean up the string and now it looks like this:
"{ id : 00ed6f0947c230f4 name : Caloocan City boundingBoxCoordinates : [ [ { longitude : 120.9607709 latitude : 14.6344661 } { longitude : 120.9607709 latitude : 14.7873208 } { longitude : 121.1015117 latitude : 14.7873208 } { longitude : 121.1015117 latitude : 14.6344661 } ] ] countryCode : PH fullName : Caloocan City National Capital Region boundingBoxType : Polygon URL : https://api.twitter.com/1.1/geo/id/00ed6f0947c230f4.json accessLevel : 0 placeType : city country : Republika ng Pilipinas }"
Now to extract the country name, I want to use the word() function:
word(loc, n, sep=fixed(" : "))
where n in the position of the country name I still did not count. But this function gives the correct output when n=1 but gives an error for any other vaue of n:
Error in word[loc, "start"] : subscript out of bounds
Why is that happening? The loc variable certainly has more words with that separation. Or can someone suggest a better way of extracting the country name from that field?
EDIT: t1 is the dataframe that consists my entire table. Presently I am interested only in the place field of my table which has the information in the above mentioned format. Hence I am trying to load the place field into a separate variable called "loc" using the basic assignment instruction:
loc <- t1$place
In order to read it as a JSON, the place field needs to be delimited by single quotes which it is not originally. I have 2 millions rows in my table so I really can't manually add the delimiters.