15

I have a template string and an array of parameters that come from different sources but need to be matched up to create a new "filled-in" string:

string templateString = GetTemplate();   // e.g. "Mr {0} has a {1}"
string[] dataItems = GetDataItems();     // e.g. ["Jones", "ceiling cat"}

string resultingString = String.Format(templateString, dataItems);
// e.g. "Mr Jones has a ceiling cat"

With this code, I'm assuming that the number of string format placeholders in the template will equal the number of data items. It's generally a fair assumption in my case, but I want to be able to produce a resultingString without failing even if the assumption is wrong. I don't mind if there are empty spaces for missing data.

If there are too many items in dataItems, the String.Format method handles it fine. If there aren't enough, I get an Exception.

To overcome this, I'm counting the number of placeholders and adding new items to the dataItems array if there aren't enough.

To count the placeholders, the code I'm working with at the moment is:

private static int CountOccurrences(string haystack)
{
    // Loop through all instances of the string "}".
    int count = 0;
    int i = 0;
    while ((i = text.IndexOf("}", i)) != -1)
    {
        i++;
        count++;
    }
    return count;
}

Obviously this makes the assumption that there aren't any closing curly braces that aren't being used for format placeholders. It also just feels wrong. :)

Is there a better way to count the string format placeholders in a string?


A number of people have correctly pointed out that the answer I marked as correct won't work in many circumstances. The main reasons are:

  • Regexes that count the number of placeholders doesn't account for literal braces ( {{0}} )
  • Counting placeholders doesn't account for repeated or skipped placeholders (e.g. "{0} has a {1} which also has a {1}")
Damovisa
  • 19,213
  • 14
  • 66
  • 88
  • Whilst not answering your question this post may offer an alternative that you might find interesting http://stackoverflow.com/questions/159017/named-string-formatting-in-c – Kane Jun 04 '09 at 02:30

12 Answers12

18

Counting the placeholders doesn't help - consider the following cases:

"{0} ... {1} ... {0}" - needs 2 values

"{1} {3}" - needs 4 values of which two are ignored

The second example isn't farfetched.

For example, you may have something like this in US English:

String.Format("{0} {1} {2} has a {3}", firstName, middleName, lastName, animal);

In some cultures, the middle name may not be used and you may have:

String.Format("{0} {2} ... {3}", firstName, middleName, lastName, animal);

If you want to do this, you need to look for the format specifiers {index[,length][:formatString]} with the maximum index, ignoring repeated braces (e.g. {{n}}). Repeated braces are used to insert braces as literals in the output string. I'll leave the coding as an exercise :) - but I don't think it can or should be done with Regex in the most general case (i.e. with length and/or formatString).

And even if you aren't using length or formatString today, a future developer may think it's an innocuous change to add one - it would be a shame for this to break your code.

I would try to mimic the code in StringBuilder.AppendFormat (which is called by String.Format) even though it's a bit ugly - use Lutz Reflector to get this code. Basically iterate through the string looking for format specifiers, and get the value of the index for each specifier.

Joe
  • 122,218
  • 32
  • 205
  • 338
  • Yeah, great point. Teaches me I should definitely wait a bit longer before marking a correct answer. – Damovisa Jun 05 '09 at 05:49
10

Merging Damovisa's and Joe's answers. I've updated answer afer Aydsman's nad activa's comments.

int count = Regex.Matches(templateString, @"(?<!\{)\{([0-9]+).*?\}(?!})")  //select all placeholders - placeholder ID as separate group
                 .Cast<Match>() // cast MatchCollection to IEnumerable<Match>, so we can use Linq
                 .Max(m => int.Parse(m.Groups[1].Value)) + 1; // select maximum value of first group (it's a placegolder ID) converted to int

This approach will work for templates like:

"{0} aa {2} bb {1}" => count = 3

"{4} aa {0} bb {0}, {0}" => count = 5

"{0} {3} , {{7}}" => count = 4

MarekBaron
  • 537
  • 3
  • 18
  • To correctly handle literal curly braces, change the regex to ignore them: @"(?<!\{)\{([0-9]+).*?\}(?!})" This way the (valid) "{4} aa {{0}} bb {0}, {0}" string also correctly matches ignoring the second zero. – Adrian Clark Jun 04 '09 at 06:55
  • 1
    Needed to use this today, and found an issue... If there are no place holders in a chunk of text, it falls over... So, I moved the regex.matches chunk to a var (var matches = Regex.Matches(inputText, @"(?<!\{)\{([0-9]+).*?\}(?!})");) and then the count after (int count = matches.Cast().Max(m => int.Parse(m.Groups[1].Value)) + 1;) checking first that matches had more than 0 results... – TiernanO Jun 01 '12 at 14:32
  • Fails for "{0} {{{3}}} , {{7}}" though - returns 1 rather than 4. If you really want to do this, I'd copy the code in `StringBuilder.AppendFormat` rather than hoping that a Regex (difficult to read and above all difficult to test) will be equivalent. – Joe Sep 15 '12 at 21:56
  • This answer is still flawed: returns 4 for "{0} {3} , {{{7}}}" when it should return 8. – Joe Oct 12 '17 at 10:55
8

You can always use Regex:

using System.Text.RegularExpressions;
// ... more code
string templateString = "{0} {2} .{{99}}. {3}"; 
Match match = Regex.Matches(templateString, 
             @"(?<!\{)\{(?<number>[0-9]+).*?\}(?!\})")
            .Cast<Match>()
            .OrderBy(m => m.Groups["number"].Value)
            .LastOrDefault();
Console.WriteLine(match.Groups["number"].Value); // Display 3
Paulo Santos
  • 11,285
  • 4
  • 39
  • 65
  • For reference, the code that worked was: int len = new System.Text.RegularExpressions.Regex("{[0-9]+.*?}").Matches(template).Count; – Damovisa Jun 04 '09 at 02:48
  • The problem is that the character { and } are special in a Regular Expression, as per documentation: http://msdn.microsoft.com/en-us/library/3206d374.aspx – Paulo Santos Jun 04 '09 at 02:56
  • 3
    This won't work - it does not take account of literal braces - {{ or }} and in any case counting the number of format specifiers isn't much use - see my answer. – Joe Jun 04 '09 at 06:49
  • Ok Joe. A solution would then be to change RegEx to get captures and loop through all of them and put them into a List while checking their previous existence. – Robert Koritnik Jun 04 '09 at 07:09
  • 2
    This answer should not have been marked as the correct answer, as it is wrong. You actually need the highest numbered tag, so if {4} is the highest, 5 parameters are needed. – Philippe Leybaert Jun 04 '09 at 07:12
  • I agree: this should not be the accepted correct answer, even the author of the original thinks it works for him. (Or... maybe it should, because of that? At least I don't think it should be.) – peSHIr Jun 04 '09 at 07:24
  • But @activa and peSHlr - this doesn't look for the highest numbered tag, it counts the number of tags. Having said that, it doesn't account for literal braces as Joe mentioned. So yes, it is wrong. – Damovisa Jun 05 '09 at 05:48
  • Disclaimer: my sentiments are with Jamie Zawinski as far as regular expressions are concerned. I couldn't tell you if the above code works without (a) comments stating exactly what it is trying to do, and (b) a full set of unit tests for all edge cases. – Joe Jun 05 '09 at 20:47
3

Marqus' answer fails if there are no placeholders in the template string.

The addition of the .DefaultIfEmpty() and m==null conditional resolves this issue.

Regex.Matches(templateString, @"(?<!\{)\{([0-9]+).*?\}(?!})")
     .Cast<Match>()
     .DefaultIfEmpty()
     .Max(m => m==null?-1:int.Parse(m.Groups[1].Value)) + 1;
  • I'm sorry I couldn't get your version work, my unit test with a template string without placeholder fails with your linq code, although the idea looks fine (I'm learning). – barbara.post Jul 08 '14 at 14:19
3

There is a problem with the regex proposed above in that it will match on "{0}}":

Regex.Matches(templateString, @"(?<!\{)\{([0-9]+).*?\}(?!})")
...

The problem is when looking for the closing } it uses .* which allows an initial } as a match. So changing that to stop on the first } makes that suffix check work. In other words, use this as the Regex:

Regex.Matches(templateString, @"(?<!\{)\{([0-9]+)[^\}]*?\}(?!\})")
...

I made a couple static functions based on all this, maybe you'll find them useful.

public static class StringFormat
{
    static readonly Regex FormatSpecifierRegex = new Regex(@"(?<!\{)\{([0-9]+)[^\}]*?\}(?!\})", RegexOptions.Compiled);

    public static IEnumerable<int> EnumerateArgIndexes(string formatString)
    {
        return FormatSpecifierRegex.Matches(formatString)
         .Cast<Match>()
         .Select(m => int.Parse(m.Groups[1].Value));
    }

    /// <summary>
    /// Finds all the String.Format data specifiers ({0}, {1}, etc.), and returns the
    /// highest index plus one (since they are 0-based).  This lets you know how many data
    /// arguments you need to provide to String.Format in an IEnumerable without getting an
    /// exception - handy if you want to adjust the data at runtime.
    /// </summary>
    /// <param name="formatString"></param>
    /// <returns></returns>
    public static int GetMinimumArgCount(string formatString)
    {
        return EnumerateArgIndexes(formatString).DefaultIfEmpty(-1).Max() + 1;
    }

}
JHo
  • 456
  • 4
  • 6
3

You could "abuse" the ICustomFormatter to gather the placeholders and return them to the caller. This simply reuses the built-in parsing algorithm, instead of trying to reimplement it (and possibly deviate from the built-in algorithm).

using System;
using System.Collections.Generic;
using System.Linq;

namespace FormatPlaceholders {

    class Program {

        class FormatSnooper : IFormatProvider, ICustomFormatter {

            public object GetFormat(Type formatType) {
                return this;
            }

            public string Format(string format, object arg, IFormatProvider formatProvider) {
                Placeholders.Add(((int)arg, format));
                return null;
            }

            internal readonly List<(int index, string format)> Placeholders = new List<(int index, string format)>();

        }

        public static IEnumerable<(int index, string format)> GetFormatPlaceholders(string format, int max_count = 100) {

            var snooper = new FormatSnooper();

            string.Format(
                snooper,
                format,
                Enumerable.Range(0, max_count).Cast<object>().ToArray()
            );

            return snooper.Placeholders;

        }

        static void Main(string[] args) {
            foreach (var (index, format) in GetFormatPlaceholders("{1:foo}{4:bar}{1:baz}"))
                Console.WriteLine($"{index}: {format}");
        }

    }

}

Which prints:

1: foo
4: bar
1: baz

You can then easily find the max of index, count, find "holes" etc...


I realize I'm (years) late to the party, but I had the need for something similar to what OP asked, so I share the solution I came up with here, in case someone finds it useful...

Branko Dimitrijevic
  • 50,809
  • 10
  • 93
  • 167
  • Nice, this is what I needed. For me the actual index doesn't matter so "{3} is running" should return 1 and not 4. This implementation does that. – Varun Sharma Sep 15 '19 at 20:25
3

Not actually an answer to your question, but a possible solution to your problem (albeit not a perfectly elegant one); you could pad your dataItems collection with a number of string.Empty instances, since string.Format does not care about redundant items.

jerryjvl
  • 19,723
  • 7
  • 40
  • 55
  • True, and something I thought of. I would be making an assumption about the maximum number of placeholders though. That and if the count matches (which it usually does), it's a bit of a waste of time and space... – Damovisa Jun 04 '09 at 02:45
  • How much waste it is depends a bit on how you create the 'dataItems' array... if you are constructing it in a 'new' already then the waste of time will be really negligible, and the waste of space is limited by the fact that you use a reference to 'string.Empty', which is a single instance no matter how often you refer to it; as long as the array does not stay around very long the scope of the space waste is really also fairly minimal... all of this obviously depends strongly on how and how often these arrays are created. – jerryjvl Jun 04 '09 at 02:56
3

Perhaps you are trying to crack a nut with a sledgehammer?

Why not just put a try/catch around your call to String.Format.

It's a bit ugly, but solves your problem in a way that requires minimal effort, minimal testing, and is guaranteed to work even if there is something else about formatting strings that you didn't consider (like {{ literals, or more complex format strings with non-numeric characters inside them: {0:$#,##0.00;($#,##0.00);Zero})

(And yes, this means you won't detect more data items than format specifiers, but is this a problem? Presumably the user of your software will notice that they have truncated their output and rectify their format string?)

Jason Williams
  • 56,972
  • 11
  • 108
  • 137
1

Very Late to the question, but happened upon this from another tangent.

String.Format is problematic even with Unit Testing (i.e. missing an argument). A developer puts in the wrong positional placeholder or the formatted string is edited and it compiles fine, but it is used in another code location or even another assembly and you get the FormatException at runtime. Ideally Unit test or Integration tests should catch this.

While this isn't a solution to the answer it is a workaround. You can make a helper method that accepts the formatted string and a list (or array) of objects. Inside the helper method pad the list to a predefined fixed length that would exceed the number of placeholders in your messages. So for example below assume that 10 placeholders is sufficient. The padding element can be null or a string like "[Missing]".

int q = 123456, r = 76543;
List<object> args = new List<object>() { q, r};     

string msg = "Sample Message q = {2:0,0} r = {1:0,0}";

//Logic inside the helper function
int upperBound = args.Count;
int max = 10;

for (int x = upperBound; x < max; x++)
{
    args.Add(null); //"[No Value]"
}
//Return formatted string   
Console.WriteLine(string.Format(msg, args.ToArray()));

Is this ideal? Nope, but for logging or some use cases it is an acceptable alternative to prevent the runtime exception. You could even replace the null element with "[No Value]" and/or add array positions then test for No Value in the formatted string then log it as an issue.

Charles Byrne
  • 834
  • 2
  • 11
  • 20
1

Based on this answer and David White's answer here is an updated version:

string formatString = "Hello {0:C} Bye {{300}} {0,2} {34}";
//string formatString = "Hello";
//string formatString = null;

int n;
var countOfParams = Regex.Matches(formatString?.Replace("{{", "").Replace("}}", "") ?? "", @"\{([0-9]+)")
    .OfType<Match>()
    .DefaultIfEmpty()
    .Max(m => Int32.TryParse(m?.Groups[1]?.Value, out n) ? n : -1 )
    + 1;

Console.Write(countOfParams);

Things to note:

  1. Replacing is a more straightforward way to take care of double curly braces. This is similar to how StringBuilder.AppendFormatHelper takes care of them internally.
  2. As were are eliminating '{{' and '}}', regex can be simplified to '{([0-9]+)'
  3. This will work even if formatString is null
  4. This will work even if there is an invalid format, say '{3444444456}'. Normally this will cause integer overflow.
Anton Krouglov
  • 3,077
  • 2
  • 29
  • 50
1

Since I don't have the authority to edit posts, I'll propose my shorter (and correct) version of Marqus' answer:

int num = Regex.Matches(templateString,@"(?<!\{)\{([0-9]+).*?\}(?!})")
             .Cast<Match>()
             .Max(m => int.Parse(m.Groups[0].Value)) + 1;

I'm using the regex proposed by Aydsman, but haven't tested it.

Philippe Leybaert
  • 168,566
  • 31
  • 210
  • 223
0

You could use a regular expression to count the {} pairs that have only the formatting you'll use between them. @"\{\d+\}" is good enough, unless you use formatting options.

John Fisher
  • 22,355
  • 2
  • 39
  • 64