-2

Im not good at regex and I need your help. I have string like this: Re:_=C8SOB_Poji=9A=9Dovna and I need to split into string[] by individual letters, but when "=" appears the following two symbols are as one entry field.

Example: string input = "Re:_=C8SOB_Poji=9A=9Dovna";

Result: string[] strs = {R,e,:,_,C8,S,O,B,_,P,o,j,i,9A,9D,o,v,n,a}

fubo
  • 44,811
  • 17
  • 103
  • 137
Jiří Vrbas
  • 155
  • 7
  • 3
    Have you tried anything yet? Hint: you don't need regular expressions. – p.s.w.g May 04 '16 at 11:48
  • The only solution that I thought is just by regex. What else do you mean? – Jiří Vrbas May 04 '16 at 12:08
  • 1
    what in case of `=9=9D` ? – fubo May 04 '16 at 12:12
  • Please show your best attempt. It gives us an opportunity to show you where you went wrong, which is often more useful than just giving the right answer. Even if it didn't work, some code is better than nothing. Without demonstrating that you've at least attempted to find a solution yourself, the question essentially becomes ["gimme teh codez"](http://meta.stackexchange.com/questions/108551/what-site-to-use-if-you-have-a-gimme-teh-codez-question) – p.s.w.g May 04 '16 at 12:12
  • 1
    Looks very much like *qouted-printable* encoded text? If so you should probably use a qp parser - [C#: Class for decoding Quoted-Printable encoding?](http://stackoverflow.com/questions/2226554/c-class-for-decoding-quoted-printable-encoding) – Alex K. May 04 '16 at 12:18
  • @JiříVrbas Literally anything that can be done with a regular expression can be done with regular code using string functions. The reverse is not always true though. – juharr May 04 '16 at 12:18
  • @fubo =9=9D will be {=,9,9D} after each "=" must be combination of [0-9A-F] – Jiří Vrbas May 04 '16 at 12:18
  • @AlexK. I've never had to work with that encoding before, but I agree. That's certainly what this looks like. – p.s.w.g May 04 '16 at 12:21
  • @Alex K. Thanks for hit. This is encoding using in email (MIME) and Im trying tu parse it. – Jiří Vrbas May 04 '16 at 12:30

6 Answers6

1

Here's a non regular expression solution

private static IEnumerable<string> CustomSplit(string str)
{
    if (str == null)
    {
        yield break;
    }
    for (int i = 0; i < str.Length; i++)
    {
        if (str[i] == '=' && i < str.Length - 2 && str[i + 1] != '=' && str[i + 2] != '=')
        {
            yield return str.Substring(i + 1, 2);
            i += 2;
        }
        else
        {
            yield return str.Substring(i, 1);
        }
    }
} 

This will make sure there are two non equal sign characters after the equal sign, or it will just output the equal sign and continue to the next character. Also it returns an empty IEnumerable<string> if the string is null or if the string is empty.

juharr
  • 31,741
  • 4
  • 58
  • 93
1

Simplest solution (?) - loop through and test for =:

using System;


public class Program
{
    public static void Main()
    {
        string  str = "Re:_=C8SOB_Poji=9A=9Dovna";
        char    ch, q;

        for(int idx=0; idx<str.Length; idx++)
        {
            // Default - interpret as a single character
            q = ' ';
            ch = str[idx];

            if(str[idx]=='=' && idx+2<str.Length)
            {
                // Assume HEX, otherwise catch and use defaults
                try {
                    ch = (char)Convert.ToInt32(str.Substring(idx+1,2),16);
                    idx+=2;
                    q = '"'; // "Quote" converted character
                }
                catch {};
            }

            // Do something with result
            Console.WriteLine( "{0}{1}{2}", q, ch, q);
        }       
    }
}

Change the console output to whatever you wan't to do with the characters.

Check the .net fiddle.

Regards

Edit: Added conversion of the hex codes to characters ;)

SamWhan
  • 8,296
  • 1
  • 18
  • 45
  • This will throw an exception if one of the last two characters is a equals sign. – juharr May 04 '16 at 12:16
  • @juharr True, but I just meant to show the general principle I was going for. It also doesn't create a string array. I left that for Jiří. But... fixed it :) – SamWhan May 04 '16 at 12:19
1

Here a RegEx approach

string input = "Re:_=C8SOB_Poji=9A=9Dovna";

string[] strs = Regex.Matches(input, "((?<=)[0-9A-F]{2}|.(?<!=))")
                     .Cast<Match>().Select(x => x.Value).ToArray();

I updated my answer to your reuqirement, that the = must be followed by two [0-9A-F] characters

fubo
  • 44,811
  • 17
  • 103
  • 137
0

I solve it by split

string s = @"Re:_=C8SOB_Poji=9A=9Dovna";
string[] sArr = s.Split('=');
List<string> temp = new List<string>();

bool bGetTwo = false;

if (s[0] == '=')
{
    bGetTwo = true;
}

foreach (var str in sArr)
{
    if (bGetTwo)
    {
        temp.Add(str.Substring(0, Math.Min(str.Length, 2)));
    }

    bGetTwo = true;

    if (str.Length > 2)
    {
        string subStr = str.Substring(2, str.Length-2);
        foreach (var c in subStr.ToCharArray())
        {
            temp.Add(c.ToString());
        }                    
    }                
}
sArr = temp.ToArray();
s-s
  • 382
  • 2
  • 12
0

I found regex like this:

(?>=([0-9A-F]{2}))|(.)

I do not have enough reputaion to add comments

s-s
  • 382
  • 2
  • 12
-1

If I got your question well, you can use this:

string[] stringarray= "Re:_=C8SOB_Poji=9A=9Dovna".ToCharArray().Select(c => c.ToString()).ToArray();
R.You
  • 565
  • 3
  • 15
  • 2
    This just breaks the string into one character strings. The equal sign is a special case were the two characters after it should be in one string. Also the `ToCharArray` is pointless since `string` implements `IEnumerable`. – juharr May 04 '16 at 12:28