1

I am trying to parse the date out of this file name with regex.

LBX845656_PayOnline_0528-20191429.txt.052819220054.bak

The problem I am encountering is that I need the regex to match the entire filename, while capturing a group with the Date in it.

I have written this regex to capture the Date in capture group 1:

([0-9]{0,8}(?=\.txt))

Using a regex tester online, this seems to capture the correct string from the filename I am looking for, but I cant figure out how to also get the regex to match the whole string.

Here is what I want to return

FULL MATCH: LBX845656_PayOnline_0528-**20191429**.txt.052819220054.bak
CAPTURE GROUP 1: 20191429

Thanks in advance for any advice.

  • `\w+_\w+_\d+-([0-9]{0,8}(?=\.txt)).txt.\d+.bak` Would match what you currently have at least.. Not sure what other variations you can have for that string though – austin wernli Jun 18 '19 at 18:34
  • What do you mean by "match the whole string"? – wilx Jun 18 '19 at 18:35
  • [Is that OK for you](https://regex101.com/r/unmQwS/1)? – Toto Jun 18 '19 at 18:36
  • You do not need to match the whole string if you want to get the string matched with `[0-9]{0,8}(?=\.txt)`, you just need `matcher.find()` – Wiktor Stribiżew Jun 18 '19 at 18:44
  • @WiktorStribiżew The OP said they needed to match both the entire string and if it matches, grab the date portion. That's a requirement for the OP, not for for the regex engine. Imo you closed down the discussion too early, And that specific topic was not covered in your cited question. – WJS Jun 18 '19 at 19:02
  • No, OP wants to extract a part of the string, but ran into trouble using `matches()`. Solution is to use `find()`. – Wiktor Stribiżew Jun 18 '19 at 19:05
  • Look at what the OP wanted to return. A full match and the date. The requirement was stated clearly at the end of the question. It was even highlighted. And the solution would be to use `matches()` for a complete match of the filename. – WJS Jun 18 '19 at 19:06
  • The full match is the whole string, why match it at all. If a match is found, do whatever you need with the input string. OP has a so called XY problem here. – Wiktor Stribiżew Jun 18 '19 at 19:28
  • @WiktorStribiżew For clarity, I need to match the entire string AND the capture group because the regex plugin in Pentaho Data Integration required the Full Match, to be able to return the group match. Robert Glickman was able to understand the question and help me. Thanks. – jordannorton Jun 18 '19 at 19:53
  • You should edit the question to make it clear. Add the appropriate tag(s), too. Besides, there is no way to actually answer the question, one can only *guess* what context your group starts and ends as you provided no regex pattern requirements. – Wiktor Stribiżew Jun 18 '19 at 20:02

1 Answers1

-1

I think this is a simpler regex that will solve your problem, assuming the date is always 8 digits.

Option 1

If you can assume that the date is the only 8 digit string in the filename

^.*[^\d](\d{8})[^\d].*$

Option 2

If you want to assume that it is preceded by a "-" and has a "." after

^.*-(\d{8})\..*$

Options 3

If you want to assume it is followed by .txt

^.*(\d{8})\.txt.*$

  • As an aside, 20191429 does not look like the date, since 14 and 29 are both invalid for month. What it looks like to me is 0528-20191429 is the date plus (possibly) the time (May 28, 2019 at 2:29pm) – Robert Glickman Jun 18 '19 at 18:42
  • Thanks for your answer Robert. And ugh, yes noticed that after posting. Date format int the file is "MMDD-YYYYMMSS". But I can definitely work with what you have provided me, I was thinking about the regex in the wrong way. Thanks again. – jordannorton Jun 18 '19 at 18:46