0

When retrieving tweets from Twitter, here is a snippet of the raw JSON received (captured via Fiddler:

[{"text":"\"California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother\"\r\nhttp:\/\/url.com\/6jd5j5"}]

After doing some operations on it, involving deserialization, and then re-serializing it (via JSON.NET), it ends up in the database like this:

{"text": "\"California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother\"\r\nhttp://url.com/6jd5j5"}

The only difference, being that the URLs don't have the backslash escapes around the forward slashes. (I'm not sure if this is a big deal, please chime in if it is)

My confusion actually is how to handle these escaped control characters. Running a SELECT query against my table in the MySQL client, Using MySQL's JSON_UNQUOTE function, it will unescape the characters. The \r\n are properly escaped, but it keeps the double quotes around the text which is interesting...

+----------+-------------------------------------------------------------------------------------------------------+
| user_id  | JSON_UNQUOTE(JSON_EXTRACT(tw.tweet_json, '$.text'))                                                   |
+----------+-------------------------------------------------------------------------------------------------------+
| 12844052 | "California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother"
http://url.com/6jd5j5 |
+----------+-------------------------------------------------------------------------------------------------------+

Here's what it looks like when I don't use the JSON_UNQUOTE unescape function:

+-------------------------------------------------------------------------------------------------------------+
| JSON_EXTRACT(tw.tweet_json, '$.text')                                                                       |
+-------------------------------------------------------------------------------------------------------------+
| "\"California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother\"\r\nhttp://url.com/6jd5j5" |
+-------------------------------------------------------------------------------------------------------------+

I need to export these tweets to a CSV file, to be used by Excel or Google Sheets.

I use the following specifier after my query:

INTO OUTFILE 'C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/so.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n';

Opening the CSV file with Excel displays the following: (The second row/entry uses the JSON_UNESCAPE feature)

Notice how the second entry, while using the JSON_UNESCAPE feature, shows excessive slashes.

enter image description here

Here's the CSV file opened in notepad:

  "\"\\\"California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother\\\"\\r\\nhttp://url.com/6jd5j5\""
"\"California GOP Files FEC Complaint Over Obama Visit to Dying Grandmother\"
\
http://url.com/6jd5j5"

Question: How can I properly escape the tweet here, so that it can be read as original intended? Original Tweet Link

Edit The advice to use ESCAPED BY '"' from @Michael - sqlbot has brought me closer - but now when opening the CSV, the second part of the tweet (the URL) is in a new cell. I've verified that this happens in both Excel and Google Sheets:

enter image description here

Rendered CSV image (copy and pasting the text doesn't work well) enter image description here

contactmatt
  • 18,116
  • 40
  • 128
  • 186
  • You should be able to use `JSON_UNQUOTE() ... ENCLOSED BY '"' ESCAPED BY '"' ...` – Michael - sqlbot May 05 '18 at 14:27
  • ...because The CSV standard, such as it is, escapes `"` by doubling them up and then wrapping (enclosing) the entire string with one more set of qoutes, and adding the escape directive *should* generate valid CSV. Try this? – Michael - sqlbot May 05 '18 at 14:29
  • @Michael-sqlbot thanks for your advice here - it's gotten me closer. Please see my edit in the answer when you have a second. – contactmatt May 05 '18 at 16:28
  • Your CSV result isn't quite right. The first line is good, and the last line is good, but the second line should not be there at all, and it isn't clear what is triggering that... but my best guess is that there are embedded CR `\r` and LF `\n` and one of them needs to be eliminated. The root of the issue is that `SELECT ... INTO OUTFILE` is very primitive and CSV is also very primitive. `REPLACE(JSON_UNQUOTE(...),'\r','')` *might* be the trick. Or put `\n` here instead. – Michael - sqlbot May 05 '18 at 18:34

1 Answers1

1

After some digging around, some helpful comments from @Michael - sqlbot, and this answer, I got it working properly in Google Sheets and Excel with the following statement:

    SELECT REPLACE(JSON_UNQUOTE(JSON_EXTRACT({JSON_COL}, {JSON_PROP_TO_RETRIEVE})), '\r\n', '\n')
      ...
    INTO OUTFILE 'C:/ProgramData/MySQL/MySQL Server 5.7/Uploads/{FILE_NAME}.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '"' ESCAPED BY '"' LINES TERMINATED BY '\r\n';
contactmatt
  • 18,116
  • 40
  • 128
  • 186