0

I'm working on something that will read a user's text messages and export them to a csv file, which they can then download. The messages are being retrieved from a third-party web interface—I am essentially using js to grab the html of each message and compiling it as needed. The content of each message is added to a variable which, once all message are gathered, is given to a new Blob, which is then downloaded.

The problem I am having is that, in this web interface, emoji are represented as images, rather than characters. Thus, when writing a message containing an emoji to a file, the result is as so:

"Blah blah blah <img height="18px" width="18px" class="emoji adjustedSpriteForMessageDisplay spriteEMOJI sprite-1f612" data-textvalue="%F0%9F%98%92" src="assets/blank.gif">"

Now, from this image, we can get 2 workable values:

The UTF-8 hex value

F09F9892

and the Unicode codepoint (I may be referring to this wrong, I don't know much about encoding).

U+1f612

Now, what I want to do is take either of these values (whichever works better), and write it to the csv file as the character itself. So that, when viewing the csv file in a text editor or what have you, it would appear as

enter image description here

Though I have no idea where to even start with this. Maybe it's as simple as throwing some syntax around the character values, but I haven't been able to get anything from google, because I'm not familiar enough with encoding to know what to Google.

GtwoK
  • 497
  • 4
  • 16

2 Answers2

1

I suggest preprocessing the data as you grab it from the webpage instead of extracting it from the string afterwards.

You can then use decodeURIComponent() to decode the percent-encoded string:

decodeURIComponent('%F0%9F%98%92')

Combine that with jQuery to access the data-textvalue-attribute:

decodeURIComponent($(element).data('textvalue'))

I created a simple example on JSFiddle. For some reason the emoji doesn't render correctly in the result screen in my browser, but that is a font issue. When looking at the result using a DOM inspector (or copying the text into a different application), the result is shown with a smiley.

olav
  • 181
  • 6
0

CSV file format does not have character encoding information, so Excel usually assumes ASCII.

https://en.wikipedia.org/wiki/Comma-separated_values#General_functionality

Microsoft Excel mangles Diacritics in .csv files?

Community
  • 1
  • 1
Jkarttunen
  • 6,764
  • 4
  • 27
  • 29