1

I'm building a little project using the remoteok api at https://remoteok.io/api. I am trying to turn them back into nicely formatted markdown for my frontend to parse.

The ones that have HTML tags inside them I have no problem with, but the ones that don't, I can't really do anything with as all of the newline and other characters appear as enter image description here

A full listing looks like so: enter image description here

String in text form:

" Popdog is exploring the relationship between gaming content creators and their communities. We believe that while creators are fantastic at building audiences, thereâs much yet to be done to create real community. But weâre in need of more great people to see that happen. Weâre a remote-first, venture-backed, fast-growing company that believes in the power of live streaming and content creation in gaming. Our team is small and singularly focused on changing the world of gaming for the better. This position spearheads brand marketing and community growth and engagement as they intersect. The role would focus on clear brand development, evolution, and reach across traditional and experimental marketing channels, as well as building out a foundational community development plan for users and talent alike. This role would act as a public-facing mouth-piece for the company, facilitating Popdogâs place within the industry and garnering support and intel from users to improve user experience. We aim to be the best at meeting viewers and creators where they are and supporting them. Benefits Medical, Dental, and Vision Insurance Company paid life insurance, short term and long term disability insurance 401k plan with 4% company matching Flexible work schedule Generous PTO Popdog, Inc. is an Equal Opportunity / Affirmative Action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, national origin, disability, or protected Veteran status "

The URL of the listing that has complete line breaks and paragraphs: https://remoteok.io/remote-jobs/101454-remote-brand-community-manager-popdog-x-loaded

I'm trying to get it back to the form in the link. It's not a duplicate to the other post as there's no solution currently to either. There has to be a way to parse these characters into line breaks.

What can I do to even view these characters correctly let alone parse them? I don't even know what to begin looking up.

eveo
  • 2,797
  • 15
  • 61
  • 95
  • 1
    Those are valid characters in a string. You are viewing them correctly. – Andy Ray Jan 28 '21 at 00:38
  • @AndyRay is there anything at all I can do to parse this correctly? What are these characters called? Why are they appearing this way? Is this because I am parsing them as UTF-8 not ISO-8859-1? Thanks. – eveo Jan 28 '21 at 00:39
  • 1
    Does this answer your question? [Converting JSON response into correct encoding in JavaScript](https://stackoverflow.com/questions/57264420/converting-json-response-into-correct-encoding-in-javascript) – LS_ Jan 28 '21 at 00:43
  • *Is this because I am parsing them as UTF-8 not ISO-8859-1?* - it looks like exactly the opposite, parsing UTF-8 as something else. – tevemadar Jan 28 '21 at 00:45
  • @tevemadar the screenshot is using jsonviewer and viewing remoteok.io/api so i have no control over that. appears the same way in my code – eveo Jan 28 '21 at 00:46
  • Ah, these are unicode characters like • (https://www.compart.com/en/unicode/U+00B7) and this API has a bug where their input encoding doesn't match their JSON encoding, mangling the characters. I'm not sure you can correct it. – Andy Ray Jan 28 '21 at 00:47
  • @AndyRay Aaaaaaand I just pivoted from my scraper. Time to work on my other project that doesn't rely on external data. Thanks. – eveo Jan 28 '21 at 00:49
  • Please post code, errors, sample data or textual output here as plain-text, not as images that can be hard to read, can’t be copy-pasted to help test code or use in answers, and are barrier to those who depend on screen readers or translation tools. You can edit your question to add the code in the body of your question. For easy formatting use the `{}` button to mark blocks of code, or indent with four spaces for the same effect. The contents of a **screenshot can’t be searched, run as code, or copied and edited to create a solution.** – tadman Jan 28 '21 at 00:52
  • @tadman edited. posted a screenshot of the text because i dont know if it would lose formatting in the process but i updated it. i cant post code because im just making a fetch request to the remoteok.com/api, its like two lines. – eveo Jan 28 '21 at 00:57
  • Sometimes we can help undo damage if you post it as mangled text. We can then dump it into processing tools to find out what's going on, especially when it includes "invisible" characters your screenshot omits. – tadman Jan 28 '21 at 00:58
  • @tadman alright i updated specifically what im trying to do (parse that block into something with separate line breaks) – eveo Jan 28 '21 at 01:03
  • This is a form of [mojibake](https://en.wikipedia.org/wiki/Mojibake) so you need to identify what the desired text is. I'd focus on snippets like "Popdogâs". Is that intended to be "Popdog’s"? I'm guessing Windows-1252 is involved somewhere inadvertently. – tadman Jan 28 '21 at 01:12
  • @tadman yes that is the desired text, and there are also some line breaks – eveo Jan 28 '21 at 01:14
  • Check how you're downloading this. Check the encoding and that you're *respecting* it and doing any conversion if necessary. You may be interpreting a Windows-1252 document as UTF-8, or vice-versa. – tadman Jan 28 '21 at 01:15
  • im just doing a `await axios.get(url)` to `remoteok.io/api` and the data is returned with all the weird `â` symbols. Specifically the description fields that don't contain HTML tags, I can't parse into how they look on the website (with paragraphs and line breaks and all) – eveo Jan 28 '21 at 01:16
  • @eveo I've updated the answer, seems to work fine for me or am I missing something? – LS_ Jan 28 '21 at 01:24

1 Answers1

1

Could something like this work?

fetch('https://remoteok.io/api')
.then(response => response.json())
.then(data => {
data.forEach((item) => item.description = decode_utf8(item.description));
console.log(data);
})


function decode_utf8(s) {
return decodeURIComponent(escape(s));
}
LS_
  • 6,763
  • 9
  • 52
  • 88
  • Perfect, that worked! Had to run it through `unescape()` and now my markdown component displays it just fine with all the formatting. Thanks! – eveo Jan 28 '21 at 02:01
  • 1
    @eveo Glad it worked :) Sorry if it's missing info, I'll try to edit with an explanation later – LS_ Jan 28 '21 at 02:17