0

I have some basic regex to split text based on some simple characters

/(\.|"|0-9|!|\?|,|:|;|¡|¿|჻|·|…|؟|،|॥|।|«|\(|\)|\{|»|<|>|⸗|⸚|֊|᛫|᠂|᠃|᠅|⁏|።|፣|、|。|‥|「|」|『|』|〝|〟|“|”|,|—)/

But I would like to also split the text if a certain block of text goes on long enough without any of these characters. Is this possible with regex?

  • `.{x}` as your last alternation. Where `x` is the number of characters – CrayonViolent Sep 11 '22 at 22:09
  • You can use split("") for splitting every char of a string. – Rohit Khandelwal Sep 11 '22 at 22:10
  • @CrayonViolent I just add that to the end of the regex? Edit: That did it. Thank you! – DingleberrySmith Sep 11 '22 at 22:19
  • You should use a character class, not an alternation: https://stackoverflow.com/questions/22132450/why-is-a-character-class-faster-than-alternation – Nick Sep 11 '22 at 23:49
  • @Nick Thank you! I wasn't aware that it was slower. – DingleberrySmith Sep 12 '22 at 00:43
  • @DingleberrySmith no worries. It also has the advantage that you don't need to escape characters inside a class. The one thing you need to be careful of is putting a `-` (if there is one) at either the beginning or end of the class, otherwise it is interpreted as part of a range (as in for example `0-9`). – Nick Sep 12 '22 at 00:46
  • @DingleberrySmith yah as @nick alluded to, the regex could be cleaned up a lot in general. A note that the part I mentioned should NOT be in the char class; it should remain as an alternation. So e.g. `/[."0-9!?]|.{x}/` – CrayonViolent Sep 12 '22 at 15:47
  • @CrayonViolent Thank you for the suggestion. Quick question though. How can I do this with combinations of characters like (gh) or a period followed by a space? I tried just wrapping them in parentheses but it doesn't seem to work. – DingleberrySmith Sep 12 '22 at 16:19
  • @DingleberrySmith character classes only match a single character at a time. If you want to match a pattern involving more, you have to do it as an alternation, e.g. `/[."0-9!?]|\.\s|.{x}/` – CrayonViolent Sep 13 '22 at 19:12

0 Answers0