0

How best do I sanitize text like

abc&#39; a>b<c & a<b>c

converting/displaying

abc&#39; a&gt;b&le;c &amp; a&le;b&gt;c

or in clear text

abc' a>b<c & a<b>c

so that I can use it via

myDiv.innerHtml=...   or
myDiv.setInnerHtml(..., myValidator, mySantitizer);

A text assignment myDiv.text=... converts all & and <> eliminating the valid apostrophe &#39; - the HtmlEscape.convert(..) class/method also converts all & in all HtmlEscapeMode's.

Could write my own Sanitizer, but hope that I overlooked some standard library/call.

Jorg Janke
  • 1,027
  • 7
  • 9
  • So you want to partially sanitize text? Why do you want to convert some characters to html entities, but not others? Why does the input text contain some "unsanitized" characters, but not others? – Tonio Jun 24 '15 at 16:31
  • Translations, e.g. from Google Translate and others usually contain certain html codes. My current workaround is to convert it to unicode, so that I I can assign it to myElement **.text** . I hoped that I either overlooked such a library call - or that there is a solution for assignments to myElement **.innerHtml** – Jorg Janke Jun 27 '15 at 21:13

2 Answers2

1

After some thought, I realized that using Validators or HtmlEscape/Mode was not the best way to solve the problem.

The original problem was that translation engines use &#39; for the apostrophe - probably to not confuse it with the misuse of apostrophe as a single quote.

In summary, the best solution is to replace &#39; with the correct unicode character for the apostrophe, which is actually

The (correct) apostrophe U+0027 &#39; is misliked is as character fonts print it (incorrectly) straight - which graphic guys really hate - like the straight ".

With that, you can assign the translated text to element.text and if it contains problematic characters, they are escaped automatically by Dart - and rendered just fine.

Jorg Janke
  • 1,027
  • 7
  • 9
0

DartPad Link

RexExp for HTML Entity

import 'dart:html';
import 'dart:convert';

void main() {
  String htmlStr = r'abc&#39; a>b<c & a<b>' * 3;
  var reg = new RegExp(r"(.*?)(&#[1-9][0-9]{1,3}|[A-Za-z][0-9A-Za-z]+;)|(.*)");
  List<Match> matchs = reg.allMatches(htmlStr);
  var resStr = '';
  matchs.forEach((m) {
    var g1 = m.group(1);
    var g2 = m.group(2);
    var g3 = m.group(3);
    g1 = HTML_ESCAPE.convert(g1 == null ? '' : g1);
    g2 = g2 == null ? '' : g2;
    g3 = HTML_ESCAPE.convert(g3 == null ? '' : g3);
    resStr += g1 + g2 + g3;
  });
  print(resStr);
  document.body.setInnerHtml(resStr);
}
Community
  • 1
  • 1
Ticore Shih
  • 26
  • 1
  • 3