0

I am working on string maniplations using regex.

   Source: string value = @"/webdav/MyPublication/Building%20Blocks/folder0/folder1/content_1.xml";

   output required:
           Foldername: folder1
           content name: content
           folderpath:/webdav/MyPublication/Building%20Blocks/folder0/folder1/

I am new to this, can any one say how it can be done using regex. Thank you.

svick
  • 236,525
  • 50
  • 385
  • 514
Patan
  • 17,073
  • 36
  • 124
  • 198

3 Answers3

0

You could use named captures, but you're probably better off (from a security and implementation aspect) just using the Uri class.

Community
  • 1
  • 1
Jeff Moser
  • 19,727
  • 6
  • 65
  • 85
0

The rules you need seem to be the following:

  • Folder name = last string preceding a '/' character but not containing a '/' character
  • content name = last string not containing a '/' character until (but not including) a '_' or '.' character
  • folderpath = same as folder name except it can contain a '/' character

Assuming the rules above - you probably want this code:

string value = @"/webdav/MyPublication/Building%20Blocks/folder0/folder1/content_1.xml";

var foldernameMatch = Regex.Match(value, @"([^/]+)/[^/]+$");
var contentnameMatch = Regex.Match(value, @"([^/_\.]+)[_\.][^/]*$");
var folderpathMatch = Regex.Match(value, @"(.*/)[^/]*$");
if (foldernameMatch.Success && contentnameMatch.Success && folderpathMatch.Success)
{
    var foldername = foldernameMatch.Groups[1].Value;
    var contentname = contentnameMatch.Groups[1].Value;
    var folderpath = folderpathMatch.Groups[1].Value;
}
else
{
  // handle bad input
}

Note that you can also combine these to become one large regex, although it can be more cumbersome to follow (if it weren't already):

var matches = Regex.Match(value, @"(.*/)([^/]+)/([^/_\.]+)[_\.][^/]*$");
if (matches.Success)
{
    var foldername = matches.Groups[2].Value;
    var contentname = matches.Groups[3].Value;
    var folderpath = matches.Groups[1].Value + foldername + "/";
}
else
{
    // handle bad input
}
PinnyM
  • 35,165
  • 3
  • 73
  • 81
  • Thank you Pinny. The content may or may not contain a '_'. If it contains, i need to remove it, or else have to take the name as it is. – Patan Mar 28 '12 at 17:45
  • If it doesn't contain an underscore - e.g. 'content.xml' do you need to take the whole name or until the '.' character? – PinnyM Mar 28 '12 at 18:01
0

I agree with Jeff Moser on this one, but to answer the original question, I believe the following regular expression would work: ^(\/.+\/)(.+?)\/(.+?)\.

edit: Added example.

var value = "/webdav/MyPublication/Building%20Blocks/folder0/folder1/content_1.xml";
var regex = Regex.Match(value, @"^(\/.+\/)(.+?)\/(.+?)\.");

// check if success
if (regex.Success)
{
    // asssign the values from the regular expression
    var folderName = regex.Groups[2].Value;
    var contentName = regex.Groups[3].Value;
    var folderPath = regex.Groups[1].Value;
}
Richard
  • 8,110
  • 3
  • 36
  • 59