-1

Having issues using string to extract string between two characters. I need to get the everything between these characters including the line breaks:

reprEx <- "2100\n\nELECTRONIC WITHDRAWALS| om93 CCD ID: 964En To American Hon\nELECTRONIC WITHDRAWALSda Finance Corp 295.00\nTotal Electronic Withdrawals $93,735.18\n[OTHER WITHDRAWALS| WITHDRAWALS\nDATE DES $93,735.18\n[OTHER WITHDRAWALS| WITHDRAWALS\nDATE DESCRIPTION AMOUNT\n04/09 Pmt ID 7807388390 Refunded IN Error On 04/08"

desiredResult <- "| om93 CCD ID: 964En To American Hon\nELECTRONIC WITHDRAWALSda Finance Corp 295.00\nTotal Electronic Withdrawals $93,735.18\n[OTHER WITHDRAWALS| WITHDRAWALS\nDATE DES $93,735.18\n["

I have tried using:

desiredResult <- str_match(reprEx, "ELECTRONIC WITHDRAWALS\\s*(.*?)\\s*OTHER WITHDRAWALS")[,2]

but I just get NA back. I just want to get everything in the string that is between the first occurrence of ELECTRONIC WITHDRAWALS and the first occurrence of OTHER WITHDRAWALS. I can't tell if the new lines are what is causing the problem

gizaom
  • 184
  • 8
  • Your desired result is inconsistent with your statement: *"between the first occurrence ... and the first occurrence ..."*, where you have the second string *within* your output. – r2evans Apr 09 '20 at 22:22
  • Because this is listed as one long string, though, I do not believe the `\n` is having an adverse effect. – r2evans Apr 09 '20 at 22:25
  • `"(?s)ELECTRONIC WITHDRAWALS\\s*(.*?)\\s*OTHER WITHDRAWALS"`, see the [answer](https://stackoverflow.com/a/45981809/3832970). – Wiktor Stribiżew Apr 09 '20 at 22:51

1 Answers1

2

I think your desiredOutput is inconsistent with your paragraph, I'll prioritize the latter:

everything in the string that is between the first occurrence of ELECTRONIC WITHDRAWALS and the first occurrence of OTHER WITHDRAWALS

first <- gregexpr("ELECTRONIC WITHDRAWALS", reprEx)[[1]]
first
# [1]  7 66
# attr(,"match.length")
# [1] 22 22
# attr(,"index.type")
# [1] "chars"
# attr(,"useBytes")
# [1] TRUE
# generalized a little, in case you change the reprEx string
leftside <- if (first[1] > 0) first[1] + attr(first, "match.length")[1] else 1
second <- gregexpr("OTHER WITHDRAWALS", substr(reprEx, leftside, nchar(reprEx)))[[1]]
second
# [1] 124 176
# attr(,"match.length")
# [1] 17 17
# attr(,"index.type")
# [1] "chars"
# attr(,"useBytes")
# [1] TRUE
rightside <- leftside + second[1] - 2
c(leftside, rightside)
# [1]  29 151
substr(reprEx, leftside, rightside)
# [1] "| om93 CCD ID: 964En To American Hon\nELECTRONIC WITHDRAWALSda Finance Corp 295.00\nTotal Electronic Withdrawals $93,735.18\n["
r2evans
  • 141,215
  • 6
  • 77
  • 149