9

I have a set of sentences i need to use to do a replace, for example:

abc => cde
ab df => de
...

And i have a text where to make the changes. However i have no way to know beforehand case of said text. So, for example, if i have:

A bgt abc hyi. Abc Ab df h

I must replace and get:

A bgt cde nyi. Cde De h

Or as close to that as possible, i.e. keep case

EDIT: As i am seeing to much confusion about this i will try to clarify a bit:

I am asking about a way to keep caps after replacing and i don't think that passed through well (not well explained what thaat entails) so i will give a more realistic example using real words.

think of it like a gossary, replacing expressions by their sinonyms so to speak, so if i map:

didn't achieve success => failled miserably

then the i get as input the setence:

As he didn't achieve success, he was fired

i would get

As he failled miserably, he was fired

but if didn't was capitalized, so would failled, if achieve or success was capitalized, so would miserably, if any had more than 1 letter capitalized, so would it's counterpart

My main possibilities are (ones i really want to take into cosideration)

  • only first letter of first word capitalized
  • only first letter of every word capitalized
  • all letters capitalized

If i can handle those three that would be acceaptable already i guess - it's the easyer ones - of course a more in depth solution would be better if availlable

Any ideas?

537mfb
  • 1,374
  • 1
  • 16
  • 32
  • 2
    Are you concerned only with the capitalization of only the initial letter? For example, what should happen if you see `"ab Df"`? – Sergey Kalinichenko Jun 19 '12 at 15:56
  • Will sentences only end with `.` or could it end with other punctuation such as `!` or `?`? – Abe Miessler Jun 19 '12 at 15:58
  • @dasblinkenlight: interesting question - in the case of the example it would be equal to me whether it cased up or down (like i said, close as possible) - however, if ab df was to be replaced by de hj then i would require it beeing de Hj – 537mfb Jun 19 '12 at 16:23
  • @AbeMiessler end with any or no punctuation at all - i have no control over input – 537mfb Jun 19 '12 at 16:24
  • Also, concerning the `ab df -> de` case, what happens if it's 'AB DF` would the result be `DE`? What about `aB Df` or `Ab dF` or other possible combinations of upper/lower case? It probably is important to know. – Sephallia Jun 19 '12 at 16:45
  • 1
    @Sephallia - you are correct on your first and on your escond it would be a toss of sorts - i've edited my question both to clarify and to reduce the scope of the question – 537mfb Jun 19 '12 at 17:02
  • Thanks! I'm working on an answer right now. It's a lot more complicated than I made it out to be. But you definitely made it easier! – Sephallia Jun 19 '12 at 17:53
  • @537mfb I don't know if you get notified of edits to my answer.. but I modified it after I read your updated question. – Elias Jun 19 '12 at 20:59

5 Answers5

4

Not sure how well this will work, but this is what I came up with:

        string input = "A bgt abc hyi. Abc Ab df h";
        Dictionary<string, string> map = new Dictionary<string, string>();
        map.Add("abc", "cde");
        map.Add("ab df", "de");

        string temp = input;
        foreach (var entry in map)
        {
            string key = entry.Key;
            string value = entry.Value;
            temp = Regex.Replace(temp, key, match =>
            {
                bool isUpper = char.IsUpper(match.Value[0]);

                char[] result = value.ToCharArray();
                result[0] = isUpper
                    ? char.ToUpper(result[0])
                    : char.ToLower(result[0]);
                return new string(result);
            }, RegexOptions.IgnoreCase);
        }
        label1.Text = temp; // output is A bgt cde hyi. Cde De h

EDIT After reading the modified question, here's my modified code (it turns out to be similar steps to @Sephallia's code.. and similar variable names lol )

The code now is a bit more complicated.. but I think it's ok

        string input = 
        @"As he didn't achieve success, he was fired.
        As he DIDN'T ACHIEVE SUCCESS, he was fired.
        As he Didn't Achieve Success, he was fired.
        As he Didn't achieve success, he was fired.";
        Dictionary<string, string> map = new Dictionary<string, string>();
        map.Add("didn't achieve success", "failed miserably");


        string temp = input;
        foreach (var entry in map)
        {
            string key = entry.Key;
            string value = entry.Value;
            temp = Regex.Replace(temp, key, match =>
            {
                bool isFirstUpper, isEachUpper, isAllUpper;

                string sentence = match.Value;
                char[] sentenceArray = sentence.ToCharArray();

                string[] words = sentence.Split(' ');

                isFirstUpper = char.IsUpper(sentenceArray[0]);

                isEachUpper = words.All(w => char.IsUpper(w[0]) || !char.IsLetter(w[0]));

                isAllUpper = sentenceArray.All(c => char.IsUpper(c) || !char.IsLetter(c));

                if (isAllUpper)
                    return value.ToUpper();

                if (isEachUpper)
                {
                    // capitalize first of each word... use regex again :P
                    string capitalized = Regex.Replace(value, @"\b\w", charMatch => charMatch.Value.ToUpper());
                    return capitalized;
                }


                char[] result = value.ToCharArray();
                result[0] = isFirstUpper
                    ? char.ToUpper(result[0])
                    : char.ToLower(result[0]);
                return new string(result);
            }, RegexOptions.IgnoreCase);
        }
        textBox1.Text = temp; 
        /* output is :
        As he failed miserably, he was fired.
        As he FAILED MISERABLY, he was fired.
        As he Failed Miserably, he was fired.
        As he Failed miserably, he was fired.
        */
Elias
  • 175
  • 8
  • interesting - but seems to be able to upcase only when the caps is on the first letter - maybe i wasn't so clear on my question - i need to keep caps - so if the setence had somwhere ABc it would need to be changed to CDe - however that could be achieved to a point - but only when length matched i guess - will look deeper into your solution to see what i can pull out of it - if anything – 537mfb Jun 19 '12 at 16:39
  • As 537mfb mentioned, it doesn't work for ABc -> CDe as the result would be Cde. For cases where the `key` and the `value` are the same length, this is easily resolved using a for loop using the `.Length` property as the end condition. It may be necessary to have more clarification regarding what happens in the `ad df -> de` case. What happens depending on what letters are capitalized in `ad df`? (Posed this question on the Original Question as well) – Sephallia Jun 19 '12 at 16:47
  • @Sephallia edited the question in an attempt to clarify AND reduce scope of the problem – 537mfb Jun 19 '12 at 17:17
  • Hey Elias! If you look at my code, I certainly said that I copied your original code (before the edit). I hope that was okay! You use Regex a lot more than me ^^;. It's awesome because the place we branch off is the place that I coded myself :p. Anyway, sorry for kinda taking it without asking first, but we're all just trying to help the original poster, so I hope that's okay! – Sephallia Jun 20 '12 at 01:31
  • @Sephallia Oh no no I didn't mean it this way. What I meant was that after I modified my code I noticed that you modified it too, and our modifications follow the same steps (and even the variable names I added were close).. I meant it as a good thing. Sorry for the misunderstanding. So anyone tested the code on different examples? – Elias Jun 20 '12 at 01:55
  • Ohh okay! It's hard to tell on the internet @_@. Haha, maybe we think alike? – Sephallia Jun 20 '12 at 02:53
  • interesting - will test yours and @Sephallia's and let you know - Also upvoted both of you - thnx – 537mfb Jun 20 '12 at 09:13
  • Since both @Elias's and Sephalia's work or fail for the same cases on my test case (while both work on my reduced scope) scenario, and beeing Elias's a bit simpler and Stephalia's based on Eliases, i think only fair to accept Elias's answer – 537mfb Jun 20 '12 at 14:27
3

You could use String.IndexOf with StringComparison.CurrentCultureIgnoreCase specified to find a match. At that point, a character by character replacement would work to do the swap. The capitalization could be handled by checking with Char.IsUpper for the source character, and then using Char.ToUpper or Char.ToLower on the destination as appropriate.

Reed Copsey
  • 554,122
  • 78
  • 1,158
  • 1,373
  • This does not touch on capitalizing the letters when replacing. – Justin Pihony Jun 19 '12 at 16:08
  • @JustinPihony Sorry - thought it was clear enough in my original - the edit help? – Reed Copsey Jun 19 '12 at 16:10
  • Yah, that was not even touched on in your previous answer, so I dont see how that was clear. It is good now. The solution is not elegant, but I cant think of a way to do this elegantly...looping structures must be had – Justin Pihony Jun 19 '12 at 16:12
  • how will this cover my example where ab df becomed de? character by character doesn't seem like a good option - specially when i am not converting character but replacing words inside setences - or am i just not understanding your solution? – 537mfb Jun 19 '12 at 16:27
  • @537mfb Well, it wasn't exactly clear how you want to do the substitution in the first place. How would you handle "ab cd"->"ef" if the original was "Ab Cd"? There aren't enough letters in the new string to match cases - this gives you the tools to handle it however you choose, though. – Reed Copsey Jun 19 '12 at 16:34
  • @ReedCopsey - sorry - edit the question in an attempt to clarify AND reduce the scope of the problem - hopping that will bring out some more answers – 537mfb Jun 19 '12 at 17:03
2

You could loop through the String as an array of characters and use the Char.IsUpper(char parameter)

  1. Instantiate a blank string
  2. Set up a loop to loop through the characters
  3. Check if you need to change the character to a different one
    1. Yes: Check whether or not the character is upper or lower case, depending on the result, put the appropriate letter in the new string.
    2. No: Just throw that character into the new string
  4. Set the original string to the new string.

Might not be the most efficient or spectacular way of doing things, but it is simple, and it works.

On a side note: I am not sure how you are converting the characters, but if you are say, shifting the characters down the alphabet (when you DO want to convert them) by a constant amount, let's say you're shifting by 3. So a -> d and E -> G or something like that, then you could get the ASCII value from the character, add 3 (if you want to convert it) and then get the character from the ASCII value. As described here. You would have to do checks though to make sure that you loop back from the end of the alphabet. (or the beginning, if you're shifting left).

Edit #1: (Going to keep the above there)

Really big block of code... Sorry! This was the best way I could see to do what you were asking. Hopefully someone might come up with a more elegant way. Please do comment or anything if you require clarification!

    // (to be clear) This is Elias' (original) code modified.
    static void Main(string[] args)
    {
        string input = "As he DIDN'T ACHIEVE Success, he was fired";
        Dictionary<string, string> map = new Dictionary<string, string>();
        map.Add("didn't achieve success", "failed miserably");

        string temp = input;
        foreach (var entry in map)
        {
            string key = entry.Key;
            string value = entry.Value;
            temp = Regex.Replace(temp, key, match =>
            {
                string[] matchSplit = match.Value.Split(' ');
                string[] valueSplit = value.Split(' ');

                // Set the number of words to the lower one.
                // If they're the same, it doesn't matter.
                int numWords = (matchSplit.Length <= valueSplit.Length) 
                    ? matchSplit.Length
                    : valueSplit.Length;

                // only first letter of first word capitalized
                // only first letter of every word capitalized
                // all letters capitalized
                char[] result = value.ToCharArray(); ;
                for (int i = 0; i < numWords; i++)
                {
                    if (char.IsUpper(matchSplit[i][0]))
                    {
                        bool allIsUpper = true;
                        int c = 1;
                        while (allIsUpper && c < matchSplit[i].Length)
                        {
                            if (!char.IsUpper(matchSplit[i][c]) && char.IsLetter(matchSplit[i][c]))
                            {
                                allIsUpper = false;
                            }
                            c++;
                        }
                        // if all the letters of the current word are true, allIsUpper will be true.
                        int arrayPosition = ArrayPosition(i, valueSplit);
                        Console.WriteLine(arrayPosition);
                        if (allIsUpper)
                        {
                            for (int j = 0; j < valueSplit[i].Length; j++)
                            {
                                result[j + arrayPosition] = char.ToUpper(result[j + arrayPosition]);
                            }
                        }
                        else
                        {
                            // The first letter.
                            result[arrayPosition] = char.ToUpper(result[arrayPosition]);
                        }
                    }
                }

                return new string(result);
            }, RegexOptions.IgnoreCase);
        }
        Console.WriteLine(temp); 
    }

    public static int ArrayPosition(int i, string[] valueSplit)
    {
        if (i > 0)
        {
            return valueSplit[i-1].Length + 1 + ArrayPosition(i - 1, valueSplit);
        }
        else
        {
            return 0;
        }

        return 0;
    }
Sephallia
  • 396
  • 2
  • 9
  • see the example - ab df becomes de while abc becomes cde - obviously it's not about character convertion but rather sub setence replacement - as i stated in question – 537mfb Jun 19 '12 at 16:30
  • @537mb Ah, sorry about this. Focused too much on the abc -> cde. I'll be editing my answer (might take a bit). – Sephallia Jun 19 '12 at 16:32
  • @537mfb Actually, looking at Elias' answer, that's pretty much what I would do. I'll put a comment in Elias' answer. – Sephallia Jun 19 '12 at 16:42
  • @537mfb I'm not sure if it notifies you when I edit my answer, but I have edited my answer. – Sephallia Jun 19 '12 at 18:20
  • as i commented on @Elias's answer, i will test yours and his and let you know . also upvoted both of you - great help – 537mfb Jun 20 '12 at 09:15
  • Since both Elias's and @Sephalia work or fail for the same cases on my test case (while both work on my reduced scope) scenario, and beeing Elias's a bit simpler and Stephalia's based on Eliases, i think only fair to accept Elias's answer – 537mfb Jun 20 '12 at 14:28
0

Replace one char at a time and use

if(currentChar.ToString() == currentChar.ToUpper(currentChar).ToString())
{
   //replace with upper case variant 
}
Jambobond
  • 619
  • 3
  • 12
  • 23
0

This is pretty much what Reed was saying. The only trick is that I'm not sure what you should do when the Find and Replace strings are different lengths. So I'm choosing the min length and using that...

static string ReplaceCaseInsensitive(string Text, string Find, string Replace)
{
    char[] NewText = Text.ToCharArray();
    int ReplaceLength = Math.Min(Find.Length, Replace.Length);

    int LastIndex = -1;
    while (true)
    {
        LastIndex = Text.IndexOf(Find, LastIndex + 1, StringComparison.CurrentCultureIgnoreCase);

        if (LastIndex == -1)
        {
            break;
        }
        else
        {
            for (int i = 0; i < ReplaceLength; i++)
            {
                if (char.IsUpper(Text[i + LastIndex])) 
                    NewText[i + LastIndex] = char.ToUpper(Replace[i]);
                else
                    NewText[i + LastIndex] = char.ToLower(Replace[i]);
            }
        }
    }

    return new string(NewText);
}
Steve Wortham
  • 21,740
  • 5
  • 68
  • 90
  • naturally my example was silly and simplified adn that is obviously leading to some issues to people answering - i will edit my question - the big problem with character by character is in the following example - afg dr => ft hju - in this case if input is Afg Dr then result should be Ft Hju (keep caps = 1st letter caps as in original for this case) while AFG Dr should become FT Hju – 537mfb Jun 19 '12 at 16:46