4

We're trying to generate user friendly looking report names from SSRS Report filenames, these can't have spaces in the name but it should be possible to generate them from the report names using a regex to scan for caps/nocaps boundaries and alpha/numeric boundaries. However all caps complicates things. For example:

ListOfMembers => List Of Members
Weekly123Report => Weekly 123 Report
MembersOfIEETeam => Members Of IEE Team

So I think this is the minimum ruleset;

(a-z0-9)(A-Z)       gets replaced with "$1 $2"
(A-Z)?(A-Z)(a-z)    gets replaced with "$1$2 $3"
(A-Za-z)(0-9)       gets replaced with "$1 $2"
(0-9)(A-Za-z)       gets replaced with "$1 $2"

Is it possible to do this in one fell swoop or will it take multiple passes? Suppose we had a report filename something like:

WeeklyIEEReportWC20090103SortedByDate

I've seen SSRS perform something similar when it deals with series names on charts, it generates them on the fly from the concatenated version.

Any info appreciated! :)

Bobby Tables
  • 111
  • 3
  • 12
  • You don't have a pattern. The third sample can be ambiguous between: "Members Of IEE Team" and "Member Of IEETeam" if you are using the Upper Case Char's as separator/delimiter. You can't do that in a generic and clean way. – Tocco Jul 14 '11 at 15:53
  • 1
    @Tocco, OP states that in case of `IEETeam` he'd want `IEE Team`, so there is no problem. – Qtax Jul 14 '11 at 16:08
  • @Qtax, Ok. So it will work for. +1 for your answer. – Tocco Jul 14 '11 at 16:11
  • Excellent code many thanks Qtax and agent-j. Qtax if you can mod your regex so it copes with caps at the start of the string, I'll reset yours to the answer as you got it first. Once again many thanks guys I'm more a DBA so the C# help has been invaluable! :) – Bobby Tables Jul 15 '11 at 13:22

2 Answers2

5

My interpretation and solution:

var input = "WeeklyIEEReportWC20090103SortedByDateXFoo3W3CBar4x";
var re = @"(?!^)(?:[A-Z](?:[a-z]+|(?:[A-Z\d](?![a-z]))*)|\d+)";
string value = Regex.Replace(input, re, " $0");

Result: Weekly IEE Report WC20090103 Sorted By Date X Foo 3 W3C Bar 4x

Qtax
  • 33,241
  • 9
  • 83
  • 121
  • @tim-pietzcker, See the comment above. – Tocco Jul 14 '11 at 16:04
  • @Tocco: How can you maintain your position when Qtax has proven you wrong by providing a working regex? – Tim Pietzcker Jul 14 '11 at 16:08
  • 1
    @Tocco: No, because the OP has unambiguously specified how to handle this. – Tim Pietzcker Jul 14 '11 at 16:11
  • It shouldn't have been ambiguous, [A-Z][A-Z] is treated as an acronym. Thus the last capital in a chain is part of the next word. Thus WindowsNTServicePack should always be interpreted as Windows NT Service Pack, never Windows NTS ervice Pack. – Bobby Tables Jul 15 '11 at 13:09
1

Edit 2 fixed IEE

var input = @"WeeklyIEEReportWC20090103SortedByDate";
string p = @"(?<=[A-Z])(?=[A-Z][a-z])|(?<=[a-z0-9])(?=[A-Z])|(?<=[a-zA-Z])(?=[0-9])";
string value = Regex.Replace(input, p, " ");

produces Members Of IEE Team and Weekly IEE Report WC 20090103 Sorted By Date for the samples provided.

agent-j
  • 27,335
  • 5
  • 52
  • 79
  • 2
    @Tocco, what do you mean it can't be done with regex? It will never be as good as a human, but I'm not sure it has to be. – agent-j Jul 14 '11 at 15:57
  • "IEEReport": This part of the string can have many interpretations. So, i think that will resolve part of the problem. – Tocco Jul 14 '11 at 16:03
  • @Tocco, my regex treats consecutive uppercase letters as an abbreviation. I think you're right about this expression `IEisAGoodBrowser`... my regex produces `I Eis A Good Browser` ---- – agent-j Jul 14 '11 at 16:15