1

I have a PDF file which I converted to .txt using an online tool. Now I want to parse the data in that and split it using regular expression. I am almost done but stuck at 1 point.

Example of data is:

00 41 53 Bid Form – Design/Build (Single-Prime Contract)

27 05 13.23 T1 Services

I want to split it like : 00 41 53 Bid Form – Design/Build (Single-Prime Contract) and other is 27 05 13.23 T1 Services

The regular Expression I'm using is [0-9](\d|\ |\.)*(\D)*

It can have numbers with spaces and/or dots, then text which can be (letters, dot, comma, (, ), -, and digits).

I cannot match a string if it has number in it like the "T1 Services" above.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
Naupad Doshi
  • 496
  • 2
  • 5
  • 19
  • 2
    (Paperclip voice impression) "It looks like you're trying to split text into individual lines that doesn't require Regular Expressions. Would you like help with that?" – Simon Whitehead Apr 12 '13 at 04:08

2 Answers2

2

If I understood this correctly , you are trying to split by newline character .This is in C#.

string[] Result = Regex.Split(inputText, "[\r\n]+");
Mudassir Hasan
  • 28,083
  • 20
  • 99
  • 133
  • I am using the same Regex.Split command but because there is something wrong with the regular expression it is not splitting properly. So I basically have a doubt with the Regular Expression which I wrote above. – Naupad Doshi Apr 12 '13 at 04:50
  • Then this will surely help you..http://stackoverflow.com/questions/1547476/easiest-way-to-split-a-string-on-newlines-in-net – Mudassir Hasan Apr 12 '13 at 04:56
0

you can also done it with out regex Like this:

string phrase = ".......\n,,,,.ll..\r\n....";
string[] words;

words = phrase.Split(new string []{"\n","\r"}), StringSplitOptions.RemoveEmptyEntries);

if you want regex only then use @mhasan solution.

Civa
  • 2,058
  • 2
  • 18
  • 30