I've a common problem where I've not found a proper solution. I've multiple XML strings with a specific tag (e.g. MIME_SOURCE) and I don't know which XML string contains which value. But I have to replace all occurrences.
On the other hand I have a dictionary containing all possible values of the XML as a key and the value to replace with as value. As I said, I don't know what to replace in which XML.
E.g.
Part of first XML
<MIME>
<MIME_SOURCE>\Web\Bilder Groß\1509_131_021_01.jpg</MIME_SOURCE>
</MIME>
<MIME>
<MIME_SOURCE>\Web\Bilder Groß\1509_131_021_01_MitWasserzeichen.jpg</MIME_SOURCE>
</MIME>
<MIME>
<MIME_SOURCE>\Web\Bilder Groß\icon_top.jpg</MIME_SOURCE>
</MIME>
Part of second XML:
<MIME>
<MIME_SOURCE>\Web\Bilder klein\5478.jpg</MIME_SOURCE>
</MIME>
Dictionary looks like:
{"\Web\Bilder Groß\1509_131_021_01.jpg", "/Web/Bilder Groß/1509_131_021_01.jpg"}
{"\Web\Bilder Groß\1509_131_021_01_MitWasserzeichen.jpg", "/Web/Bilder Groß/1509_131_021_01_MitWasserzeichen.jpg"}
{"\Web\Bilder Groß\icon_top.jpg", "icon_top.jpg"}
{"\Web\Bilder klein\5478.jpg", "5478.jpg"}
My main problem is, if I iterate through the dictionary for each XML string the effort will be count of XML strings multiplied with count of entries in the dictionary (n*m). This is really bad in my case as there can be around a million XML strings and at least thousands of entries in the dictionary.
Currently I'm using string.Replace for each key of the dictionary for each XML.
Do you have a good idea how to speed up this process?
Edit:
I've changed code to the following one:
var regex = new Regex(@"<MIME_SOURCE>[\s\S]*?<\/MIME_SOURCE>");
foreach (Match match in regex.Matches(stringForXml))
{
DoReplacements...
}
This fits to the requirements for now as the replacement will only be done for each MIME_SOURCE in the XML. But I will as well have a look at the mentioned algorithm.