1

I'm working on an app that adapts text to braille specifications and it has some tricky rules on how to handle uppercase, I'd like some help. The rules are:

  1. Before a single uppercase letter, add ":"

:This is an :Example

  1. Before multiple uppercase letters and all caps words add another ":"

:This is ::ANOTHER ex::AMple, ::ALRIGHT

  1. If a sequence of uppercase words is made of more than three uppercase words in a row, add "-" to the beggining of the sequence and delete all other "::" within that sequence, except for the last one

:This is -::A VERY LONG SENTENCE WITH A SEQUENCE OF ALL ::CAPS to serve ::AS ::AN :Example

  1. Finally, if it goes from uppercase to lower case mid word (except when first capitalized letters), add ";"

:This is my fin:A;l ::EXAM;ple

Working with regex, I was able to solve for the simple ones but not all rules.

// adds : before any uppercase
   var firstChange = text.replace(/[A-Z]+/g,':$&'); 

// adds : to double+ uppercase    
   var secondChange = firstChange.replace(/[([A-Z]{2,}/g,':$&'); 

// adds ; to upper-lower change
   var thirdChange = secondChange.replace(/\B[A-Z]+(?=[a-z]/g,'$&;')    

I was trying to build up from simple to complex, then I tried the other way around, then I tried merging some rules, either way they conflict. I'm new to regex and I could use any insight on how to solve this. Thank you.

Edit: To make it more clear, I made a final example that combines all rules.

This is an Example. This is ANOTHER exAmple, ALRIGHT? This is A VERY LONG SENTENCE WITH A SEQUENCE OF ALL CAPS to serve AS AN Example. This is my finAl EXAMple.

Should become:

:This is an :Example. :This is ::ANOTHER ex::AM;ple, ::ALRIGHT? :This is -::A VERY LONG SENTENCE WITH A SEQUENCE OF ALL ::CAPS to serve ::AS ::AN :Example. :This is my fin:A;l ::EXAM;ple


SOLVED: With the help of @ChrisMaurer and @SaSkY, here is the code to solve the above problem:

(edit: fixed fourth change thanks to @Sasky)

var original = document.getElementById("area1");
var another = document.getElementById("area2");

function MyFunction(area1) {

  // include : before every uppercase
  var firstChange = original.value.replace(/[A-Z]+/g, ':$&');

  // add one more : before multiple uppercase letters
  var secondChange = firstChange.replace(/([([A-Z]{2,}|\b[|A-Z]+\b)/g, ':$&');

  // add - to beggining of long uppercase sequence
  var thirdChange = secondChange.replace(/\B(::[A-Z]+(\s+::[A-Z]+){3,})/g, '-$&');

  // removes extra :: before words within long uppercase sequence
  var fourthChange = thirdChange.replace(/(?<=-::[A-Z]+\s(?:::[A-Z]+\s)*)::(?=[A-Z]+\s)(?![A-Z]+\s(?!::[A-Z]+\b))/g, '');

  // add a lowercase symbol when it changes from uppercase to lowercase mid word
  var fifthChange = fourthChange.replace(/\B[A-Z](?=[a-z])/g, '$&;');

  // update
  area2.value = fifthChange;
}
<html>
<body>
<textarea id="area1"  rows="4" cols="40" onkeyup="MyFunction()">
</textarea>
<textarea id="area2" rows="4" cols="40"></textarea>
</body>
</html>
  • 1
    @Andreas I meant "sequence", not "sentence" (I'm going to edit it). The rest of the text remains the same, just the sequence of uppercase words changes. "A VERY" is the beggining of it, "CAPS" is the end, the words in between them get their :: erased. – Cassio Polegatto Dec 31 '22 at 14:34
  • 1
    I'm not understanding your third example. Why do you have anything before the word CAPS or the word AN? The word CAPS is already covered by the `-::` and if you do `-::` in front of AS it will carry over to AN, right? Oh, I did not read, "except for the last one". Oy vay. – Chris Maurer Dec 31 '22 at 14:56
  • 1
    @ChrisMaurer Yeah, last word in sequence is the exception. I kept "AN" and "AS" there to show that this third rule does not apply to two (or three) uppercase words in a row, those still go by second rule. Only when it's four or more that they get -:: in begging and :: before last word. – Cassio Polegatto Dec 31 '22 at 15:05
  • 1
    @CassioPolegatto The second example the word `ex::AMple` goes from capital letters `AM` to small letters `ple` and you didn't add `;` before `ple` why ? – SaSkY Dec 31 '22 at 15:17
  • 1
    @SaSkY I was building up on the rules to make it organized. Only added ; after I stated the rule. – Cassio Polegatto Dec 31 '22 at 15:27
  • 1
    @CassioPolegatto You said that `A VERY` is the beginning of the sequence, regex treats `A VERY` as two different words not one word but you treat it as one word as well as `A SEQUENCE`, Am I right ? – SaSkY Dec 31 '22 at 15:54
  • 1
    No, in my comment I was just being trivial about it. I'm not sure how to approach it in regex. I could do it like this: /[A-Z]+\s[A-Z]+\s[A-Z]+\s/g and find three uppercase words in a row, but then it would include : every three words, which is not what I want. – Cassio Polegatto Dec 31 '22 at 15:58
  • 1
    @CassioPolegatto Do you want to consider the `A` in `A VERY` as a different word from `VERY` or you want to consider it to be part of `VERY` I'm asking because if there is an example like `This is A VERY GOOD EXAMPLE`, in the previous example do you want it to be `:This is :A ::VERY ::GOOD ::EXAMPLE` or to be `:This is -::A VERY GOOD ::EXAMPLE`, that is why Iam asking – SaSkY Dec 31 '22 at 16:02
  • 1
    @SaSkY I see, I misunderstood your first question. I want it to be like the second example you gave, that's exactly it. – Cassio Polegatto Dec 31 '22 at 16:08
  • 3
    I'm gonna try to provide a solution because I think it's fun and challenging, but if someone gave me this task I would no doubt create a parser instead – Christian Vincenzo Traina Dec 31 '22 at 17:27
  • 1
    @CassioPolegatto You have to remove this regex `(?<=::[A-Z]+\s*)::([A-Z]+)(?=\s*::[A-Z]+)` because this regex will consider the word `::CAPSss` in the string `:A -::VERY ::LONG ::SENTENCE ::WITH ::A ::SEQUENCE ::OF ::ALL ::CAPSss` as the end of the sequence instead of `::ALL`, another issue is that the regex will match the word `::TEST` in `::LOL ::TEST ::LOL` and the colon would be removed from that word `::TEST`. I created a new regex pattern for this step: `(?<=-::[A-Z]+\s(?:::[A-Z]+\s)*)::(?=[A-Z]+\s)(?![A-Z]+\s(?!::[A-Z]+\b))` and the replacement string should be empty instead of `$1`. – SaSkY Jan 02 '23 at 03:25
  • @CassioPolegatto Thank you. There is one more thing to note here, the third regex `\B(::[A-Z]+(\s+::[A-Z]+){3,})` has to be edited because it will consider the word `::EXAMple` in `::A ::VERY ::GOOD ::EXAMple` as the end of the sequence and will add `-` to the word `::A`, all we have to do is to add a word boundary `\b` to ensure that the end of the sequence is only an uppercase word, the new regex would be like `\B(::[A-Z]+(\s+::[A-Z]+){3,})\b`. – SaSkY Jan 03 '23 at 03:42
  • @CassioPolegatto, There is another thing, the fifth regex `\B[A-Z](?=[a-z])` has to be edited because it will not match the `A` in `ex:Ample` in the string `will change this ex::AMple, but not this ex:Ample`, the new regex would be something like this `(?:(?<=[a-z]:)|\B)[A-Z](?=[a-z])`. – SaSkY Jan 03 '23 at 05:00

1 Answers1

3

So I think your approach is good, and the first replace seems to get the single colons into the right place. The second one screws up on single letter words like A and I. I would fix that with an added alternation:

/([([A-Z]{2,}|\b[A-Z]+\b)/g

Now you need to add two more replacements; one to add the hyphen, and the other to remove the double colons.

For the hyphen you just search for three or more ::ALLCAPS whitespace combos like this:

/\B(::[A-Z]+(\s+::[A-Z]+){2,})/g

The \B handles caps at the very beginning of the string. I replaced with hyphen and $1.

To remove the double colons, I got a little trickier with a lookbehind and a lookahead:

/(?<=::[A-Z]+\s*)::([A-Z]+)(?=\s*::[A-Z]+)/g

This one is just replaced with $1. Luckily Javascript supports variable length lookbehinds.

Here it is working on Regex101: enter image description here

I did not look at your last replacement. Superficially it seemed to be OK.

Chris Maurer
  • 2,339
  • 1
  • 9
  • 8
  • 1
    Chris, I've tested it too and it works perfectly, thank you! My last one seems to be working along with your code as well. – Cassio Polegatto Dec 31 '22 at 18:13
  • 1
    Chris Maurer you did a good job, but there is one thing to note here the last regex will match `:This is -::A ::GOOD ::EXAMPLE` and replace `::GOOD` with `GOOD` and the output will be `:This is -::A GOOD ::EXAMPLE`, but @CassioPolegatto mentioned that the sequence should be more than three uppercase words in a row, add "-", so what I'm saying here is not important unless the sequence should be strictly more than 3 uppercase words. – SaSkY Dec 31 '22 at 18:43