0

This is in continuation with post - String split with specified string without delimeter

Use case #1: When searchedText is beginning / or in the end of the (watch) , if fragments value is empty I replace with searchText and it works

string watch = "Arrests as cops bust $100m money-laundering gang";
string searchedText = "Arrests as cops bust $100m";
string[] fragments = watch.Split(new string[] { searchedText }, StringSplitOptions.None);

Use case #2: when searchedText is in between of the (watch), how to deal with this scenario in below code?

//This loop will execute only two times because it can have maximum 2 values, issue will
 //come when searched value is in middle (loop should run 3 times) as for the searched value I have to apply different logic (like change background color of the text)
 // and don't change background color for head and tail
 // How do I insert searched value in middle of [0] and [1] ??

 string watch = "Arrests as cops bust $100m money-laundering gang";
 string searchedText = "cops bust";

Complete code:

foreach (SharedStringItem sharedString in sharedStrings)
{
    string innerText = sharedString.InnerText; // This contains complete line (watch)

    if (innerText.IndexOf(searchText, StringComparison.OrdinalIgnoreCase) >= 0)
    {
        sharedString.RemoveAllChildren(); // Remove complete line from spreadsheet because we have to make it again as searched text needs to be highlighted 
        // Split the line so it will give blank for searched text and remaining line 
        string[] fragments = innerText.Split(new string[] { searchText }, StringSplitOptions.None);

        // loop through both words/line
        foreach (var item in fragments)
        {
             DocumentFormat.OpenXml.Spreadsheet.Text text = null;

             // If item is blank append the search text else append the remaining line /word
             if(string.IsNullOrEmpty(item))
                 text = new DocumentFormat.OpenXml.Spreadsheet.Text((item != "" ? " " : String.Empty) + searchText);
             else
                 text = new DocumentFormat.OpenXml.Spreadsheet.Text((item != "" ? " " : String.Empty) + item);

             text.Space = SpaceProcessingModeValues.Preserve;

             // New Run needs to be created for each splitted line/word, run is like a row in spreadsheet
             // You cannot create a single run because you need to take care of searched text as it needs to be highlighted before adding to the row
             Run run = new Run();
             run.Append(text);

             // This code should only be executed for searched text
             if (searchText.Equals(text.InnerText, StringComparison.Ordinal))
             {
                 if (run.RunProperties == null)
                     run.RunProperties = new RunProperties();

                 run.RunProperties.Append(new Color { Rgb = "008000" });
                 run.RunProperties.Append(new DocumentFormat.OpenXml.Spreadsheet.Bold());

             }

             // This line add individual run (Example -> Arrests as + <highlight searched text> + remaining text
            sharedString.Append(run);
        }
    }
}


Case : It does not work

seachedText = merrylands
watch = "httdailytelegraph.com.au/newslocal/parramatta/trio-charged-over-alleged-100m-money-laundering-syndicate-at-merrylands-guildford-west/news-story/92ba3163ce58ad8b49989131fa7a5d8e"
Bokambo
  • 4,204
  • 27
  • 79
  • 130
  • 1
    I tried my best..where you feel it should be improved ? – Bokambo Apr 24 '20 at 20:05
  • 3
    Look at it now - that should be your **minimal standard** going forward .... – marc_s Apr 24 '20 at 20:15
  • This seems now to be an openxml question about how to create highlighted search results. If your split returns `{ "one", "two", "", "done" }` you want to output `searchtext` between items of that array, which would yield something with the structure of: `"onesearchtexttwosearchtextsearchtextdone"` is that right? Except there are openxml-specific things like Runs at play here, right? – Wyck Apr 25 '20 at 16:15

1 Answers1

1

Updated : You can try this

        string text = "Trio charged over alleged $100m money laundering syndicate at Merrylands, Guildford West";
        string searchtext = "charged over";
        searchtextPattern =  "(?=" + Regex.Escape(searchtext) + ")";

        string[] fragments= Regex.Split(text, searchtextPattern);
        //fargments will have two elements here
        // fragments[0] - "Trio"
        // fragments[1] - "charged over alleged $100m money laundering syndicate at Merrylands, Guildford West"

now you can again split fragment which have search text i.e fragments1 in this case. see code below

            var stringWithoutSearchText = fragments[1].Replace(searchtext, string.Empty);

you need to check whether each fragment contains search text or not. You can do that it your foreach loop on fragments. add below check over there

     foreach (var item in fragments)
     { 
        if (item.Contains(searchtext))
        { 
          string stringWithoutSearchText = item.Replace(searchtext, string.Empty);
        }
     }

I tried to fit it into your code. You can try something like this

foreach (SharedStringItem sharedString in sharedStrings)
        {
            string innerText = sharedString.InnerText; // This contains complete line (watch)

            if (innerText.IndexOf(searchText, StringComparison.OrdinalIgnoreCase) >= 0)
            {
                sharedString.RemoveAllChildren(); // Remove complete line from spreadsheet because we have to make it again as searched text needs to be highlighted 
                                                  // Split the line so it will give blank for searched text and remaining line 

                var searchtextPattern = "(?=" + Regex.Escape(searchText) + ")";

                string[] fragments = Regex.Split(innerText, searchtextPattern);

                // loop through both words/line
                foreach (var item in fragments)
                {
                 if (!string.IsNullOrEmpty(item))
                    {

                        //It will check whether the item contains search string or not 

                        if (item.Contains(searchtext))
                        {
                            // now GetRun() method called two times here

                            string stringWithoutSearchText = item.Replace(searchtext, string.Empty);
                            // in your example method argument will be  "charged over"
                            var run = GetRun(new DocumentFormat.OpenXml.Spreadsheet.Text(" " + searchtext));
                            //this code will only execute for search text
                            if (run.RunProperties == null)
                                run.RunProperties = new RunProperties();

                            run.RunProperties.Append(new Color { Rgb = "008000" });
                            run.RunProperties.Append(new DocumentFormat.OpenXml.Spreadsheet.Bold());

                            sharedString.Append(run);
                            // in your example method argument will be  "alleged $100m money laundering syndicate at Merrylands, Guildford West"
                            if (!string.IsNullOrEmpty(stringWithoutSearchText))
                                sharedString.Append(GetRun(new DocumentFormat.OpenXml.Spreadsheet.Text(" " + stringWithoutSearchText)));
                        }
                        else
                        {
                            //in your example method argument "will be Trio"
                            sharedString.Append(GetRun(new DocumentFormat.OpenXml.Spreadsheet.Text(" " + item)));
                        }
                    }
                }
            }
        }

your GetRun Method will be like this

 private Run GetRun(DocumentFormat.OpenXml.Spreadsheet.Text text)
    {
        text.Space = SpaceProcessingModeValues.Preserve;

        // New Run needs to be created for each splitted line/word, run is like a row in spreadsheet
        // You cannot create a single run because you need to take care of searched text as it needs to be highlighted before adding to the row
        Run run = new Run();
        run.Append(text);
        return run;
    }

case 2:

//if search text is at end
string watch = "Bitcoin ATMs Highlight Flaws in EU Money Laundering Rules";
string searchtext = "Money Laundering Rules";
//fragment of above string by using Regex.Split will be like 
// fragments[0] - "Bitcoin ATMs Highlight Flaws in EU"
// fragments[1] - "Money Laundering Rules"

case 3:

//if search text is at start
string watch = "Money Laundering Rules Bitcoin ATMs Highlight Flaws in EU";
string searchtext = "Money Laundering Rules";
//fragment of above string by using Regex.Split will be like 
// fragments[0] - ""
// fragments[1] - "Money Laundering Rules Bitcoin ATMs Highlight Flaws in EU"

check these three cases in the above code

Reference : https://stackoverflow.com/a/521172/8652887

  • I did not understood the solution you provide, can you please ellaborate/ explain in my above sample code , please see use case 2? – Bokambo Apr 25 '20 at 00:58
  • I need three values in Regex.Split i.e [0] = Trio, [1] = charged over , [2] = Remaining string but above code only returns two – Bokambo Apr 25 '20 at 01:11
  • I have updated my answer. I have not much idea about your code but I tried to fit it your code. see the logic how the code is executing 3 times in your example. in first iteration code is executing with "Trio" and during second iteration it will split the string again because second fragment contains search text so that code will execute two times with two different strings i.e one with search text("charged over") and one without search text ("alleged $100m money laundering syndicate at Merrylands, Guildford West") – Yogesh Wavhal Apr 25 '20 at 10:17
  • You forgot to use `Regex.Escape(searchtext)` - necessary if the search text includes any character that is not matched literally. – Wyck Apr 25 '20 at 15:46