-2

I have the following line from a CSV file:

Name,Age,Country,State,Zip,Phone,Email,Address

I am using the following Java regex to capture Name,Age,Country into 1 group but it always captures this:

Regex --> ^((?:.*,){3})

Result --> Name,Age,Country,State,Zip,Phone,Email,

Why is it not respecting the {3} quantifier I am using?

summerNight
  • 1,446
  • 3
  • 25
  • 52
  • 3
    Because `.` matches all characters and `*` is greedy. – ctwheels Oct 30 '17 at 15:18
  • What happens when you add a question mark directly after the *, like this: `.*?` – omijn Oct 30 '17 at 15:20
  • You cannot extract separate groups with `{3}` anyway. You will need `^([^,]*),([^,]*),([^,]*)` – anubhava Oct 30 '17 at 15:20
  • I've seen it's closed as duplicate, and I won't reopen it, but the linked question has quite bad answers, have a look at the one here. Explicit matching should always be preferred to just the non greedy solution. – Denys Séguret Oct 30 '17 at 15:21
  • @anubhava or simply use `[^,]+` instead and forget about capture groups altogether. – ctwheels Oct 30 '17 at 15:22

1 Answers1

0

A dot matches a comma too. You have two solutions:

  • the bad one, make it not greedy: ^((?:.*?,){3})
  • the right one: exclude commas: ^((?:[^,]*,){3})

The first one is bad because it's expensive and has potential for catastrophic backtracking.

Denys Séguret
  • 372,613
  • 87
  • 782
  • 758