1

I am very bad with RegEx. Can anyone help me getting RegEx for this pattern.

Here is the pattern

(Words).(Single Character, can be empty)(white spaces)(words, can be empty):(Words, can be empty)

Here are the examples

VERS. 2.00: Ver 2.00
WRAP. NO:
STRT.F                  4501.0000:START DEPTH
WELL. C5 1H:WELL
FTG GTG. :LOCATION FOOTAGE DESCRIPTION

Update 1:

Here is what I have done.

string re1 = "((?:[a-z][a-z]+))";   // Word 1
string re2 = ".*?"; // Non-greedy match on filler
string re3 = "(\\.)";   // Any Single Character 1
string re4 = "(.)"; // Any Single Character 2
string re5 = "(\\s+)";  // White Space 1
string re6 = "((?:[a-z][a-z]+))";   // Word 2
string re7 = ".*?"; // Non-greedy match on filler
string re8 = "(:)"; // Any Single Character 3
string re9 = ".*?"; // Non-greedy match on filler
string re10 = "(?:[a-z][a-z]+)";    // Uninteresting: word
string re11 = ".*?";    // Non-greedy match on filler
string re12 = "((?:[a-z][a-z]+))";  // Word 3

Regex r = new Regex(re1 + re2 + re3 + re4 + re5 + re6 + re7 + re8 + re9 + re10 + re11 + re12, RegexOptions.IgnoreCase | RegexOptions.Singleline);

Update 2:

Okay. I have tried something new. Here is my regex.

(\.)(.)(\s+)(4501.0000)(:)

Here is the input.

STRT DTG.F                  4501.0000:START DEPTH

And here is output.

STRT DTG
.
F

4501.0000
:
START DEPTH

Now I need to only replace 4501.0000 with regex for sentence (e.g. "some text" or "some more text"),

DC Slagel
  • 528
  • 1
  • 7
  • 13
fhnaseer
  • 7,159
  • 16
  • 60
  • 112
  • probably you need to recheck the LAS specification first, `(WORDS).` not necessarily true, it could be a space after `(WORDS)spacespacespace.` – Yuliam Chandra Aug 27 '14 at 07:55
  • We did some modifications to LAS. So it is not LAS format (you can say inspired), – fhnaseer Aug 27 '14 at 08:03
  • @YuliamChandra Is it possible that there are two colon ":" in LAS line?, – fhnaseer Aug 27 '14 at 08:46
  • based on my understanding, not possible, I wrote the converter couple years ago, your regex could be something like [this](http://regex101.com/r/yS7oC5/1), first mnem, 2nd unit, 3rd data, 4th desc – Yuliam Chandra Aug 27 '14 at 08:54
  • @YuliamChandra my understanding was same, but what if we are storing date/time value. It could be of form 11:12:35. – fhnaseer Aug 27 '14 at 09:05
  • @YuliamChandra that regex is perfect. It doesn't cover the time issue which I wrote in pervious comment. I will do some research that this is possible or not. Thanks, – fhnaseer Aug 27 '14 at 09:08
  • the time format should part of the data section and you can use different regex for data section, in your case it is only the header section I think, and data format is delimited by comma, that would be a lot of easier without regex, you need to read the full spec I could be mistaken remembering those spec – Yuliam Chandra Aug 27 '14 at 09:13
  • @YuliamChandra I found a case in which regex fails. For this input "VCLAY.V/V 123:clay volume" Note that V/V is the unit, but expression returns V as unit and "V 123" in value, can you help me in fixing the regex? – fhnaseer Aug 28 '14 at 11:02

2 Answers2

2

The header section of the LAS file (generally) has this kind of format.

<MNEM> .<UNIT> <DATA> : <DESCRIPTION>

The regex can be like this.

^([\w\s]*)\s*\.([^ ]*)\s*([^:]*)\s*:(.*)$

Explanation

^         -> beginning of line
([\w\s]*) ->   1st group, MNEM (take words and/or space)
\s*       -> space
\.        -> period delimiter
([^ ]*)     ->   2nd group, UNIT (take everything until it sees space)
\s*       -> space
([^:]*)   ->   3rd group, DATA (take everything until it sees colon)
\s*       -> space
:         -> colon delimiter
(.*)      ->   4th group, DESCRIPTION (take everything)
$         -> end of line

DEMO

Yuliam Chandra
  • 14,494
  • 12
  • 52
  • 67
0

use \s or " " to include whitespace as well. Something like

     ((?:[a-z][a-z\s]+))

or

      ((?:[a-z][a-z ]+))
vks
  • 67,027
  • 10
  • 91
  • 124