When retrieving tweets from Twitter, here is a snippet of the raw JSON received (captured via Fiddler:
[{"text":"\"California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother\"\r\nhttp:\/\/url.com\/6jd5j5"}]
After doing some operations on it, involving deserialization, and then re-serializing it (via JSON.NET), it ends up in the database like this:
{"text": "\"California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother\"\r\nhttp://url.com/6jd5j5"}
The only difference, being that the URLs don't have the backslash escapes around the forward slashes. (I'm not sure if this is a big deal, please chime in if it is)
My confusion actually is how to handle these escaped control characters. Running a SELECT query against my table in the MySQL client, Using MySQL's JSON_UNQUOTE function, it will unescape the characters. The \r\n
are properly escaped, but it keeps the double quotes around the text which is interesting...
+----------+-------------------------------------------------------------------------------------------------------+
| user_id | JSON_UNQUOTE(JSON_EXTRACT(tw.tweet_json, '$.text')) |
+----------+-------------------------------------------------------------------------------------------------------+
| 12844052 | "California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother"
http://url.com/6jd5j5 |
+----------+-------------------------------------------------------------------------------------------------------+
Here's what it looks like when I don't use the JSON_UNQUOTE unescape function:
+-------------------------------------------------------------------------------------------------------------+
| JSON_EXTRACT(tw.tweet_json, '$.text') |
+-------------------------------------------------------------------------------------------------------------+
| "\"California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother\"\r\nhttp://url.com/6jd5j5" |
+-------------------------------------------------------------------------------------------------------------+
I need to export these tweets to a CSV file, to be used by Excel or Google Sheets.
I use the following specifier after my query:
INTO OUTFILE 'C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/so.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';
Opening the CSV file with Excel displays the following: (The second row/entry uses the JSON_UNESCAPE feature)
Notice how the second entry, while using the JSON_UNESCAPE feature, shows excessive slashes.
Here's the CSV file opened in notepad:
"\"\\\"California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother\\\"\\r\\nhttp://url.com/6jd5j5\""
"\"California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother\"
\
http://url.com/6jd5j5"
Question: How can I properly escape the tweet here, so that it can be read as original intended? Original Tweet Link
Edit
The advice to use ESCAPED BY '"'
from @Michael - sqlbot has brought me closer - but now when opening the CSV, the second part of the tweet (the URL) is in a new cell. I've verified that this happens in both Excel and Google Sheets:
Rendered CSV image (copy and pasting the text doesn't work well)