4

My goal is to find a file name ("MyFile.txt") inside a larger string. I.e.:

Some text before MyFile.txt some other text after

Currently I'm successfully using a Regular Expression with a character class of something like the following (simplified):

[\w\.\-]

This works fine, until the file contains other characters that are outside the \w group, e.g. an em dash: "My—File.txt".

My approach:

The method Path.GetInvalidPathChars returns an array of invalid characters. I've tried to use this method. Unfortunately, I found no way of "converting" this to be useful inside a Regular Expression.

I'm aware of

Still, I found no solution.

My question:

Is there any Regular Expression (or any other way) to find and extract a file name inside a larger string, based on the result of Path.GetInvalidPathChars?

Hossein Narimani Rad
  • 31,361
  • 18
  • 86
  • 116
Uwe Keim
  • 39,551
  • 56
  • 175
  • 291

2 Answers2

3

I won´t use a regex for this at all as it becomes incredibly complex and unreadable. In particular a filename could be nearly any string, including most special characters, numbers, spaces. Even worse there are even files without a dot to seperate an extension. So I´d suggest to simply do an Contains-check on all your invalid characters:

char[] invalidChars = Path.GetInvalidPathChars;
bool valid = !myString.Contains(x => invalidChars.Contains(x));

Extracting the candidates instead is even simpler. The idea is to split your large string on all invalid characters. This means everything in between the invalid characters is considered a file-name, e.g:

"myTest.extension""myTest.extension"
"myFile:anotherFile""myFile"; "anotherFile"
"myFile with space""myFile with space"
"a File with .-determined extension.dot""a File with .-determined extension.dot"

This is achieved by this code:

var fileNames = myText.Split(invalidChars);

EDIT: If you really want a regex you can build one dynamically from your invalid characters:

var pattern = String.Format("([^{0}]*)", new String(invalidCharacters));
var r = new Regex(pattern);
MakePeaceGreatAgain
  • 35,491
  • 6
  • 60
  • 111
  • Only part of my string is a file name, not the whole string. Plus: I do not want to check true/false, I want to get the string. – Uwe Keim May 22 '17 at 09:41
  • No -1 from me. The `Split` idea looks nice! – Uwe Keim May 22 '17 at 09:51
  • 1
    Instead `String.Join("", invalidCharacters.Select(x => x.ToString()))` I do prefer [`new string(invalidCharacters)`](https://msdn.microsoft.com/en-us/library/ttyxaek9(v=vs.110).aspx) . – Uwe Keim May 22 '17 at 10:58
  • 1
    `String.Join("", ` is now unnecessary, I guess. I've removed it in your code, hope this is OK. – Uwe Keim May 22 '17 at 11:50
0

If your file name do not contains space and do contain extension, then this simple idea may help you

string line = "Some text before MyFile.txt some other text after";

//If you look for path:
//var array = Path.GetInvalidPathChars().ToList();

//If you look for file name
var array = Path.GetInvalidFileNameChars().ToList();
array.Add(' ');

var potentialFileNames = line.Split(array.ToArray(), StringSplitOptions.RemoveEmptyEntries)
                             .Where(i => i.Contains('.')).ToList();

 //potentialFileNames[0] = "MyFile.txt"
Hossein Narimani Rad
  • 31,361
  • 18
  • 86
  • 116