2

I am downloading files from an FTP server. Some of the file names have spaces in them but my RegEx doesn't recognise this.

Example:

-rw-r--r-- 1 ftp ftp        8613651 Apr 15  2011 Crystal Reports User Guide.pdf

Code:

string[] splitDownloadFile = Regex.Split(dFile, @"\s+");
string fMonth = splitDownloadFile[5];
string fDate = splitDownloadFile[6];
string fyear = splitDownloadFile[7];
string fName = splitDownloadFile[8];

Is it possible to set the string fName to be the rest of the string?

Noelle
  • 772
  • 9
  • 18
  • 44

5 Answers5

8

You could use the string.Split() method from the .NET Framework and specify the maximum number of splits.

This way the last part (filename) would not be split into separate parts.

EDIT: Code

string s = "-rw-r--r-- 1 ftp ftp        8613651 Apr 15  2011 Crystal Reports User Guide.pdf";
string[] c = {" ", "\t"};
string[] p = s.Split(c, 9, StringSplitOptions.RemoveEmptyEntries);
string name = p[8];
Console.WriteLine(name);
BergmannF
  • 9,727
  • 3
  • 37
  • 37
4

Capturing groups make this easy.

var match = Regex.Match(dFile, @"\S+\s+\S+\s+\S+\s+\S+\s+\S+\s+(?<month>\S+)\s+(?<date>\S+)\s+(?<year>\S+)\s+(?<name>.+)");

string fName = match.Groups["name"].Value;
agent-j
  • 27,335
  • 5
  • 52
  • 79
  • 1
    How about `Regex.Match(dFile, @"(\S+\s+){4}+(?\S+)\s+(?\S+)\s+(?\S+)\s+(?.+)");`? – Olly Apr 20 '12 at 14:09
3

If you were to use the String.Split method instead of Regex.Split, you could use the String.Split Method (Char[], Int32) overload you can achieve the results you want. You'd need to work out exactly what whitespace characters you need to cater for.

Something like:

string test = "-rw-r--r-- 1 ftp ftp        8613651 Apr 15  2011 Crystal Reports User Guide.pdf";
string[] parts = test.Split(new[] { '\t', ' ' }, 9, StringSplitOptions.RemoveEmptyEntries);

If you really want to use Regex, you could do something like this to reassemble the filename:

string[] again = Regex.Split(test, "\\s+");
var fname = string.Join(" ", again.Skip(8).ToArray());

You would need a using System.Linq; at the top of your code. The filename would, however, be only an approximation of the original one. Multiple consecutive spaces or tab characters would have been replaced with a single space.

Olly
  • 5,966
  • 31
  • 60
1

btw you may try to to glue/join all words starting n-th:

string fileName = String.Join("", splitDownloadFile.Skip(7)); // if file name starts from 8th segment

Actually that's just a workaround to improper usage of Split(), but for you knowledge how can you fix that.

abatishchev
  • 98,240
  • 88
  • 296
  • 433
1

If the pattern is consistent, don't split but put into and extract out of named capture groups

string data = "-rw-r--r-- 1 ftp ftp        8613651 Apr 15  2011 Crystal Reports User Guide.pdf";

string pattern = @"
^                       # Beginning Anchor
(?<Permissions>[^\s]+)  # Get permissions into named capture
(?:\s+)                 # Match but don't capture space
(?<Count>\d+)
(?:\s+)
(?<Op1>[^\s]+)          # Continue with capturing valued text into named
(?:\s+)                 # captures and matching, but not capturing space which is ignored.
(?<Op2>[^\s]+)
(?:\s+)
(?<Size>[^\s]+)
(?:\s+)
(?<Month>[^\s]+)
(?:\s+)
(?<Day>[^\s]+)
(?:\s+)
(?<Year>[^\s]+)
(?:\s+)
(?<FileName>[^\r\n]+)";

// Ignore option only applies to the pattern so we can comment it.
var mtGroup = Regex.Match(data, pattern, RegexOptions.IgnorePatternWhitespace).Groups;

Console.WriteLine ("In {0} we created {1}", mtGroup["Month"].Value, mtGroup["FileName"].Value);
/* Output

In Apr we created Crystal Reports User Guide.pdf

*/
ΩmegaMan
  • 29,542
  • 12
  • 100
  • 122