Regex: Finding space between two strings that is too long

Question

I have an XML file that I am trying to parse into my database, but am getting an error stating a certain field exceeds my max character count (2000). I've identified the field in question, but don't have a row number in my error, so I have to find and delete the offender(s) in the XML itself.

Below is a sample. I need to find any entries where the characters between the first occurrence of "CCCStmts Correction" and "RoAmts" is over 2000 characters. I'm using Notepad++ and can only think this will work with regex. Ideas?

   <CCCStmts Correction="sample text" />
      <CCCStmts Correction="sample text" />
      <CCCStmts Correction="sample text" />
      <CCCStmts Correction="sample text" />
      <CCCStmts Correction="sample text" />
      <CCCStmts Correction="sample text" />
      <CCCStmts Correction="sample text" />
      <RoAmts PayType="x" AmtType="x" TotalAmt="x" />

I'm voting to close this question as off-topic because "*I have a problem, here, do it for me.*" is off topic for stackoverflow. If you have written code to solve your problem and it isn't working, then asking a question about your code would be on topic. — TessellatingHeckler, Jun 21 '18 at 19:01
Possible duplicate of [Find lines by length in NotePad++?](https://stackoverflow.com/questions/20776295/find-lines-by-length-in-notepad) — ggorlen, Jun 21 '18 at 19:01
Hi. Please read [How to Ask](https://stackoverflow.com/questions/how-to-ask), [how to create a Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve), and then edit your question accordingly. You may also want to check out [the site tour](http://stackoverflow.com/tour) to learn more about how things work around here. More [questions of uninformed users are closed](https://meta.stackoverflow.com/questions/369464/enabling-easier-elimination-of-posts-by-new-users-that-disregard-documentation); it shows off. (if you have finished the tour your informed :D). — wp78de, Jun 21 '18 at 19:14
This must be the thousandth (at least) question here about parsing [X]HTML with a regex, where someone has to post the obligatory answer to it, which is contained in [this answer](https://stackoverflow.com/a/1732454/62576). Why can't anyone learn to search for *parse XML with regex* to find it themselves? Use a DOM parser, which allows you to easily get the content, which allows you to use whatever coding language you like to work with that content, which is the proper way to do things when you're not trying to do pattern matching (which is what regexes are intended to do). — Ken White, Jun 21 '18 at 23:27

score 0 · Answer 1 · answered Jun 21 '18 at 20:51

Regex is not the answer. You could do it with regex, of course, but I assume you have used an API to represent the XML programmatically in a model? Or, even if not, that you are parsing it in order to submit the relevant value contained within the XML, to your database. So once you acquire the value, simply test its length then, and submit it if it conforms to the field's requirements.

To check the length of the string, simply use...

// if the length is 2000 or less
if (string.length()` < 2001) {
    //your routine
}

... and it will skip over any value that is composed of 2001+ characters.

This approach does not require an additional iteration purely to search, and does not require any replacements to be made. It will be much tidier, and much more efficient.

Regex: Finding space between two strings that is too long

1 Answers1