I'm trying to parse out line items from text extracted from a PDF. The text extracted comes out poorly formatted and in one long string per page. There aren't any useful delimiters, but the lines start with one of two strings. I've set up the Split() using a string array with both of those strings, but I need to know which delimiter the elements were split on.
I found this link, but I'm not that great at RegEx. Can someone assist in writing the RegEx string?
var lineItems = page.PageText.Split(new string[] { "First String Delimiter", "Second String Delimiter" }, StringSplitOptions.None);
What I need is to know is if element[x] was a result of "First String Delimiter" or "Second String Delimiter".
EDIT: I don't care if Regex is the solution. Linq may be equally suited. Linq didn't come out until after I earned my degrees, so I'm similarly unfamiliar with it.
Imagine a page with about 15-20 of these end to end coming back as one long string with no carriage returns: Since they all start with "Corporate Trade Payment Credit" or "Preauthorized ACH Credit", I can split on those, but I need to know what type it was.
Preauthorized ACH Credit (165) 10,000.00 489546541 0000000000 Text Some long description about transaction- Preauthorized ACH Credit (165) 5,310.99 8465498461 0000000000 Text Another long description Corporate Trade Payment Credit (165) 4,933.17 8478632458775 0000000000 Text Another confidential string description.