2

I have a naive function implemented to remove HTML entities. But this will do a full string search for each entity. What is the best way to do a multi string search and replace?

string replace_entities(ref string x){
  return sanitize(x).replace("’","'").replace("‘","'").replace("'","'").replace("–","-").replace("—","-")
    .replace("“","\"").replace("”","\"").replace("”","\"").replace("'","'")
    .replace("&", "&").replace("&ndash","-").replace("&mdash","-").replace(""", "\"").strip();
}
ForeverConfused
  • 1,607
  • 3
  • 26
  • 41
  • 1
    Check [this](https://stackoverflow.com/questions/15604140/replace-multiple-strings-with-multiple-other-strings) out, I guess it will help you. – Arnold Gandarillas Jun 10 '17 at 00:32

1 Answers1

1

You can try with Regex. I made a full example focus on performance :)

import std.stdio : writeln;
import std.algorithm : reduce, find;
import std.regex : ctRegex, Captures, replaceAll;   

/*
Compile time conversion table:
["from", "to"]
*/
enum HTMLEntityTable = [
    ["’"  ,"'"  ],
    ["‘"  ,"'"  ],
    ["'"   ,"'"  ],
    ["–"  ,"-"  ],
    ["—"  ,"-"  ],
    ["“"  ,"\"" ],
    ["”"  ,"\"" ],
    ["”"  ,"\"" ],
    ["'"    ,"'"  ],
    ["&"    ,"&"  ],
    ["&ndash"   ,"-"  ],
    ["&mdash"   ,"-"  ],
    ["""   ,"\"" ]
];

/*
Compile time Regex String:
Use reduce to concatenate HTMLEntityTable on index 1 to form "’|‘|..."
*/
enum regex_replace = ctRegex!( 
    reduce!((a, b)=>a~"|"~b[0])(HTMLEntityTable[0][0],HTMLEntityTable[1..$]) 
);

/*
Replace Function:
Find matched string on HTMLEntityTable and replace it.
(Maybe I should use my HTMLEntityTable as a Associative Array
 but I think this way is faster ) 
*/
auto HTMLReplace(Captures!string str){      
    return HTMLEntityTable.find!(a=>a[0] == str.hit)[0][1];
}

//User Function.
auto replace_entities( ref string html ){   
    return replaceAll!HTMLReplace( html, regex_replace);
}

void main(){
    auto html = "Start’‘'–—“””'&&ndash&mdash"End";
    replace_entities( html ).writeln;
    //Output:
    //Start'''--"""'&--"End
}
Patric
  • 763
  • 1
  • 8
  • 17