I'm trying to create a regex pattern to capture information from my credit card invoice, which I can only get as a PDF. I'm copying the text to a text editor and then I use the replace tool in notepad++ to convert that copied text to a CSV one.
I'm having problem with negative values.
Given this piece of text:
16/04 RC GRACAS 2 - 0,02
SAÚDE .RECIFE
16/04 RC GRACAS 2 02/03 45,97
SAÚDE .RECIFE
The text above contains data for 2 bill data. The regex should capture the following groups in the first entry:
"16/04": date
"RC GRACAS 2": description
"-": value sign
"0,02": value
"SAÚDE .RECIFE": categories
As well as the following groups in the second entry:
"16/04": date
"RC GRACAS 2 02/03": description
"": value sign
"45,97": value
"SAÚDE .RECIFE": categories
The current regex I have is this: ^(\d{2}/\d{2})\s+(.*)\s+([-+]?)\s?(\d{1,3}(?:\.\d{3})*(?:,\d+)?)\s+(.*)?
The problem I'm having is that in the first purchase, the regex can't capture the minus sign, it becomes parte of the second group (description).
How can I change this regex to capture that sign in its own group?