22

I am trying to create a generic formatter/parser combination.

Example scenario:

  • I have a string for string.Format(), e.g. var format = "{0}-{1}"
  • I have an array of object (string) for the input, e.g. var arr = new[] { "asdf", "qwer" }
  • I am formatting the array using the format string, e.g. var res = string.Format(format, arr)

What I am trying to do is to revert back the formatted string back into the array of object (string). Something like (pseudo code):

var arr2 = string.Unformat(format, res)

// when: res = "asdf-qwer"    
// arr2 should be equal to arr

Anyone have experience doing something like this? I'm thinking about using regular expressions (modify the original format string, and then pass it to Regex.Matches to get the array) and run it for each placeholder in the format string. Is this feasible or is there any other more efficient solution?

Adrian Godong
  • 8,802
  • 8
  • 40
  • 62
  • How long is the string that is being unformatted? – Chris S Sep 11 '09 at 09:50
  • @Chris: within reasonable limits. ATM, I'm only using this on filenames. – Adrian Godong Sep 11 '09 at 09:52
  • Note that with the generality given, the results may be ambigous - e.g. `format='{0}-{1}'` and `arr = { "as-df", "qw-er" }`. Could be un-formatted in three different ways. You need to define how to deal with ambiguities, or limit the contents of the format string and the value. – peterchen Sep 11 '09 at 09:54
  • You can achieve this quite easily using Regex capturing groups: https://stackoverflow.com/questions/906493/how-do-i-access-named-capturing-groups-in-a-net-regex – Alexandru Feb 19 '21 at 18:33

6 Answers6

18

While the comments about lost information are valid, sometimes you just want to get the string values of of a string with known formatting.

One method is this blog post written by a friend of mine. He implemented an extension method called string[] ParseExact(), akin to DateTime.ParseExact(). Data is returned as an array of strings, but if you can live with that, it is terribly handy.

public static class StringExtensions
{
    public static string[] ParseExact(
        this string data, 
        string format)
    {
        return ParseExact(data, format, false);
    }

    public static string[] ParseExact(
        this string data, 
        string format, 
        bool ignoreCase)
    {
        string[] values;

        if (TryParseExact(data, format, out values, ignoreCase))
            return values;
        else
            throw new ArgumentException("Format not compatible with value.");
    }

    public static bool TryExtract(
        this string data, 
        string format, 
        out string[] values)
    {
        return TryParseExact(data, format, out values, false);
    }

    public static bool TryParseExact(
        this string data, 
        string format, 
        out string[] values, 
        bool ignoreCase)
    {
        int tokenCount = 0;
        format = Regex.Escape(format).Replace("\\{", "{");

        for (tokenCount = 0; ; tokenCount++)
        {
            string token = string.Format("{{{0}}}", tokenCount);
            if (!format.Contains(token)) break;
            format = format.Replace(token,
                string.Format("(?'group{0}'.*)", tokenCount));
        }

        RegexOptions options = 
            ignoreCase ? RegexOptions.IgnoreCase : RegexOptions.None;

        Match match = new Regex(format, options).Match(data);

        if (tokenCount != (match.Groups.Count - 1))
        {
            values = new string[] { };
            return false;
        }
        else
        {
            values = new string[tokenCount];
            for (int index = 0; index < tokenCount; index++)
                values[index] = 
                    match.Groups[string.Format("group{0}", index)].Value;
            return true;
        }
    }
}
rmtheis
  • 5,992
  • 12
  • 61
  • 78
ProKiner
  • 697
  • 7
  • 12
  • What is returned in this situation: `"a-b-c".ParseExact("{0}-{1}-{0}")`? – Zarepheth Dec 15 '15 at 21:45
  • Suggestion - replace `format = format.Replace(token, string.Format("(?'group{0}'.*)", tokenCount));` with `format = format.ReplaceFirst(token, string.Format("(?'group{0}'.*)", tokenCount)); format = format.Replace(token, string.Format("\\{0}", tokenCount));`. This should better handle format strings that use input parameters multiple times. ReplaceFirst came from: http://stackoverflow.com/questions/141045/how-do-i-replace-the-first-instance-of-a-string-in-net#141076 – Zarepheth Dec 15 '15 at 22:05
  • Does not like "abc".ParseExact("{0}{1}{2}"), and @"a$--\&*b^c".ParseExact(@"{0}$--\\&*{1}^{2}") – CRice Jul 25 '16 at 01:50
  • This one is quite good: https://blogs.msdn.microsoft.com/simonince/2009/07/09/string-unformat-ive-created-a-monster/ – CRice Jul 25 '16 at 01:50
  • This worked brilliantly for parsing parameters out of Route Maps. Although I was surprised I could not find any reference to re-using whatever code the framework itself uses for this purpose. – ChrisFox Oct 08 '19 at 20:57
13

You can't unformat because information is lost. String.Format is a "destructive" algorithm, which means you can't (always) go back.

Create a new class inheriting from string, where you add a member that keeps track of the "{0}-{1}" and the { "asdf", "qwer" }, override ToString(), and modify a little your code.

If it becomes too tricky, just create the same class, but not inheriting from string and modify a little more your code.

IMO, that's the best way to do this.

ardila
  • 1,277
  • 1
  • 13
  • 24
Clement Herreman
  • 10,274
  • 4
  • 35
  • 57
4

It's simply not possible in the generic case. Some information will be "lost" (string boundaries) in the Format method. Assume:

String.Format("{0}-{1}", "hello-world", "stack-overflow");

How would you "Unformat" it?

Mehrdad Afshari
  • 414,610
  • 91
  • 852
  • 789
  • Good point. How about creating a lesser than generic solution that has the assumption of no character in the format would be present in the array of objects? – Adrian Godong Sep 11 '09 at 09:50
  • 4
    Adrian: That would be ambiguous too in some cases: `String.Format("{0}{1}", "12", "3")` will return "123" but you can't infer from the format string that it was "12", "3" or "12", "3" or ... – Mehrdad Afshari Sep 11 '09 at 09:52
  • You would return an array of results and let the client deal with it. – toddmo Feb 10 '16 at 20:23
2

Assuming "-" is not in the original strings, can you not just use Split?

var arr2 = formattedString.Split('-');

Note that this only applies to the presented example with an assumption. Any reverse algorithm is dependent on the kind of formatting employed; an inverse operation may not even be possible, as noted by the other answers.

Joel Goodwin
  • 5,026
  • 27
  • 30
1

A simple solution might be to

  • replace all format tokens with (.*)
  • escape all other special charaters in format
  • make the regex match non-greedy

This would resolve the ambiguities to the shortest possible match.

(I'm not good at RegEx, so please correct me, folks :))

peterchen
  • 40,917
  • 20
  • 104
  • 186
0

After formatting, you can put the resulting string and the array of objects into a dictionary with the string as key:

Dictionary<string,string []> unFormatLookup = new Dictionary<string,string []>
...
var arr = new string [] {"asdf", "qwer" };
var res = string.Format(format, arr);
unFormatLookup.Add(res,arr);

and in Unformat method, you can simply pass a string and look up that string and return the array used:

string [] Unformat(string res)
{
  string [] arr;
  unFormatLoopup.TryGetValue(res,out arr); //you can also check the return value of TryGetValue and throw an exception if the input string is not in.
  return arr; 
}
Mehmet Aras
  • 5,284
  • 1
  • 25
  • 32