The following output is what generated when I save a R- data frame into Json format. My dataframe has mix of html links and some accent characters. I have to work with this file in PHP/Html environment.
library(jsonlite)
output_json <- toJSON(output, dataframe = "rows", pretty = T)
write(output_json, file = "output.txt")
{
"PMID":"<a href= \"http://www.ncbi.nlm.nih.gov/pubmed/?term=19369233\"
target=\"_blank\">19369233</a>",
"Title":"Delayed achievement of cytogenetic and molecular response is
associated with increased risk of progression among patients with
chronic myeloid leukemia in early chronic phase receiving
high-dose or standard-dose imatinib therapy.",
"Author":"Quintás-Cardama A",
"Random author names":"Järås M", "Imrédi E", "Tímár J."
},
When I open the output.txt
file or print output on html page the accent letters in first author and last author changes to ?
eg: Imr�di E
.
When I use below PHP code decode to read the json file it fails and returns NULL. On research at SO I am certain that the issue is from the accent characters, and also in some cases improper escaping of the new lines \r\n
or html tags.
!-- language: lang-php -->
$r_output = file_get_contents('output.txt');
$array_json = json_decode($r_output, true);
I tried to fix by following suggestions Eg: How do I handle newlines in JSON? or PHP json_decode() returns NULL with valid JSON? etc. However, could not solve this issue.
Hence, tagging PHP and R users, to find out if there is a better way to write the JSON format in R to avoid this issue or clean the json format before reading it in php ?
Thank you for help