How to apply style to all paragraphs with upper case text?

Question

I have very large HTML document, containg plenty of paragraps. For headings is used UPPER CASE text within paragraphs.

How to find all paragraphs containing UPPER CASE text and apply style to these paragraphs?

There is also a plenty extra spacing between text in most of paragraphs. Sample of existing headings:

<p>                                                   </p>
<p>                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
<p>Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. Ea vim brute labore. Vim te esse libris erroribus, ex minimum tacimates dissentiet duo. Ignota iisque in mei, pri sanctus albucius omnesque id. Laoreet docendi theophrastus ei pri, duo wisi tollit decore ea, tempor doctus vivendo sed ad. </p>
<p>Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. Cum sadipscing consectetuer cu, an nominavi consulatu adversarium sea, nam ad dico evertitur voluptaria. Id justo viderer bonorum per, in ius impedit tincidunt, nec et quis scaevola. Cu congue iriure scaevola usu. Ei elit reformidans suscipiantur eos, cum ut doming iracundia.  </p>
<p>                                                                             </p>
<p>                       CU CONGUE IRIURE SCAEVOLA   --
   UT DOMING IRACUNDIA. </p>
<p>                                  DICO TEMPOR HABEMUS.</p>
<p>Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. </p>

I want apply style to UPPER CASE text (headings) inside paragraphs tags to make them bold (headings).

Above block should look like below after running the regular expression replace(s) or the UltraEdit macro:

<p>                                                   </p>
<p class="bold">                      USU EA EUISMOD HONESTATIS DETERRUISSET.</p>
<p>Qualisque mnesarchum no nam, usu cu fastidii delicata. Eu mei nonumy libris, quas movet vivendo vim at. Prima epicuri conceptam pro ad, in suas nonumes similique duo. Qui mundi essent complectitur eu. Ei laudem veritus democritum vis, te ferri appareat eos. Ceteros pertinacia ea eum, quo integre theophrastus ex, eum et sint omnes detracto. Ea vim brute labore. Vim te esse libris erroribus, ex minimum tacimates dissentiet duo. Ignota iisque in mei, pri sanctus albucius omnesque id. Laoreet docendi theophrastus ei pri, duo wisi tollit decore ea, tempor doctus vivendo sed ad. </p>
<p>Usu ea euismod honestatis deterruisset. Ne quo malis meliore, duo viris liberavisse no, mea an vide mutat quodsi. Vis an vidit debitis, et noster aliquam pri, case iudicabit te sea. Cum sadipscing consectetuer cu, an nominavi consulatu adversarium sea, nam ad dico evertitur voluptaria. Id justo viderer bonorum per, in ius impedit tincidunt, nec et quis scaevola. Cu congue iriure scaevola usu. Ei elit reformidans suscipiantur eos, cum ut doming iracundia.  </p>
<p>                                                                             </p>
<p class="bold">                       CU CONGUE IRIURE SCAEVOLA   --
   UT DOMING IRACUNDIA. </p>
<p class="bold">                                  DICO TEMPOR HABEMUS.</p>
<p>Homero everti ei nam. An liber euripidis vis, pericula persecuti deseruisse ad mea. Dicant offendit sea et, per esse timeam deserunt ut. In pri enim sadipscing, ei movet soleat suavitate vim. Mea et omnesque phaedrum, paulo luptatum concludaturque vim ea. -- LIBER. </p>

As some paragraphs contain mixed upper case and lower case text, we need limit regex to search only paragraphs containing all UPPER CASE text, without lower case letters. There can be also line breaks within a paragraph.

How to accomplish this using some macro or code in UltraEdit for Linux? (Or Windows version as regex are the same anyway.)

I want apply class to paragraphs (instead of make headers H1, H2, etc.) just due to ebook readers (Kindle, etc.) may display headers in unpredictable way. Document encoding is utf-8, Cyrillic charset.

Mofi · Accepted Answer · 2016-07-29T10:37:36.083

Regular expression support in UltraEdit

UltraEdit v11.20 as mentioned in the original question before editing is very old and does not support regular expression finds/replaces in Perl syntax, just in UltraEdit and Unix syntax whereby Unix is similar to Perl, but very limited in its capabilities.

Support for Perl regular expression finds/replaces was introduced with UltraEdit for Windows v12.00 released on 2006-03-15. There have been many minor and a few major updates on UltraEdit's Perl regular expression support. The minor updates were bug fixes. And the major updates as for example in UE v19.00 and in UE v21.20 introduced a newer version of the Boost regular expression library embedded in UltraEdit for Windows with enhancements regarding the regular expression engine itself.

I don't know which regular expression library in Perl syntax is used by UltraEdit on Mac and on Linux. The various regular expression libraries on various platforms and in various versions have many in common, but of course there are also differences. So the platform and the version of UltraEdit respectively the version of the used regular expression library must be taken into account on complex Perl regular expression finds/replaces. There is not one and only Perl regular expression library used by all applications on all platforms in all versions in the last 20 years.

Character set (code page) depending solutions

With UltraEdit for Windows v11.20 or any later version of UltraEdit use for this task UltraEdit Regular Expressions with following search and replace strings with Match Case additionally checked in the replace window:

Find what: <p^(>[~A-Za-z<>]++[A-Z][^t^r^n -`{-~]++^)
Replace with: <p class="bold"^1

This is a tagged expression in UltraEdit syntax.

It searches for  with 0 or more characters NOT being an ASCII letter in any case or an angle bracket, have at least 1 ASCII character in upper case, and having 0 or more ASCII characters except the small ASCII letters before  must be found. It is expected by the third character class that < in paragraph text is already encoded with < and > is encoded with %gt; as required by HTML/XHTML and XML standards.

The third character class [^t^r^n -`{-~] contains two unusual character range definitions which requires the knowledge of the characters in ASCII table. The first one is from space to grave accent which includes many often used punctuation marks, the digits 0-9 and the upper case ASCII letters, and the second one is from left curly bracket to tilde character to include the other non word characters in ASCII character range.

The same regular expression replace in Unix/Perl syntax:

Find what: <p(>[^A-Za-z<>]*[A-Z][\t\r\n -`{-~]*)
Replace with: <p class="bold"\1

Other upper case characters like the German characters ÄÖÜ can be also added to the character classes inside the 3 square brackets. In this case the lower case language specific characters like äöüß must be added also to the first character class definition to exclude them for a positive match.

Also a negative character class can be used instead of a positive character character class with option Match Case being checked.

Example in UltraEdit syntax:

Find What: <p^(>[~A-Za-z<>ÄÖÜäöüß]+[A-ZÄÖÜ][~a-z<>äöüß]++^)
Replace With: <p class="bold"^1

This has the advantage that all characters except the lower case characters as specified in the negative character classes and the angle brackets are interpreted as valid characters for a heading which includes many characters from upper half of the used character set / code page.

This task would be easier with a newer version of UltraEdit than v11.20 because the Perl regular expression engine has predefined a character class for lower case characters and and one more for upper case characters according to Unicode definition.

Unicode solutions using Perl

A Perl regular expression replace is required for a solution which does not depend on local character sets / code pages because of using the character definitions according to Unicode standard.

But not all Perl regular expression libraries in all versions may support the expressions as written below.

The posted Perl regular expressions were tested with UltraEdit for Windows v22.20.0.49 (last public version of UE for Windows XP) and v23.20.0.28 (currently latest version of UE for Windows Vista and later Windows).

The Boost Perl regular expression library used by UltraEdit for Windows supports several character classes. The most interesting here are [:upper:] for any upper case word character and [:lower:] for any lower case character.

Examples with Perl regular expression:

Find what: <p(>\W*?[[:upper:]][^[:lower:]]+?)
Replace with: <p class="bold"\1

Find what: <p(>\W*?[[:upper:]][[:upper:]\W]*?)
Replace with: <p class="bold"\1

\W is a common "single character" character class for non word character.

The "single character" character class for all lower case characters is \l. And \u is the "single character" character class for all upper case characters. Those shorter character classes can be also used for the search strings:

Find what: <p(>\W*?\u[^\l]+?)
Replace with: <p class="bold"\1

Find what: <p(>\W*?\u[\u\W]*?)
Replace with: <p class="bold"\1

All expressions posted here make sure that the paragraph contains at least 1 upper case character.

I have UltraEdit v15.1 for Linux installed, two points I forgot: the html page have Cyrillic charset, not Latin, and the regular expression need search only paragraphs where **all** characters is UPPER CASE, otherwise it will work wrong, as there is a lot of mixed uppercase/lowercase text inside paragraphs tags). — fxgreen, Jul 29 '16 at 11:50
That all word characters in paragraph must be upper case characters is clear. The Unicode solution works for word characters in any language. For the code page depending solution for Cyrillic text the code page is important. There is [Windows-1251](https://en.wikipedia.org/wiki/Windows-1251) and [ISO 8859-5](https://en.wikipedia.org/wiki/ISO_8859-5) and [OEM 855](https://en.wikipedia.org/wiki/Code_page_855) and some others. The code page must be known for a non Unicode solution. — Mofi, Jul 29 '16 at 12:06
I tried Unicode Perl regex: `
\W*?[[:upper:]][^[:lower:]]+?
)` and `
\W*?[[:upper:]][[:upper:]\W]*?
)` and find that they search paragraphs with mixed text and omit with all UPPER CASE. Same thing with `
\W*?\u[\u\W]*?
)` regex. Possibly due to different Perl regular expression library used in UE Linux versions? — fxgreen, Jul 29 '16 at 13:44
I don't have a Linux machine and therefore don't have UEX. So I can't check if the Unicode Perl regex replaces did not work in UEX because of a different regex library, or because of you forgot to check regular expressions option in replace window, or you have not selected Perl as regular expression engine in UEX, or you have not checked the match case option. I can only guarantee that the code page depending replaces worked with UE for Windows v11.20, v22.20 and v23.20 for Latin I text in Windows-1252 encoded file and the Unicode regex strings for Latin I and Cyrillic text in a Unicode file. — Mofi, Jul 29 '16 at 15:10

score 0 · Answer 2 · answered Jul 21 '16 at 13:32

0

you can use following style

1.uppercase text-transform: uppercase;

2.lowercase text-transform: lowercase;

3.capitalize text-transform: capitalize;

Output

THIS IS SOME TEXT.

this is some text.

This Is Some Text.

answered Jul 21 '16 at 13:32

NIKHIL RANE

4,012
2
22
45

score 0 · Answer 3 · edited May 23 '17 at 11:51

The simplest and first solution that comes to my mind is next :

You can add css-class which will convert any text inside into UPPERCASE text

.uppercase {
    text-transform: uppercase;
}

to  where you want UPPERCASE letters. And then you can do any other manipulations like p.uppercase {color:red;} In your case it will be p.uppercase.bold {...}

Another way is to add custom js-function like in this answer to check if text inside  is in UPPERCASE. If it is on UPPERCASE, add your custom class.

$(function(){
  var arrP = $('p'); //get all p-elements
  if(arrP.length > 0){ //check if there are some p-elements
     for(var i=0; i<arrP.length; i++){ 
     if(isUpperCase(arrP[i].text())){ // if p-text in in UPPERCASE
        arrP[i].addClass('.bold');    // add class bold
     }
  }
}

JSFiddle Example

To work with Regex check this answer Find uppercase letters within tags using regex

But my goal is **not** to change text letters case inside paragraphs, the goal is apply style to paragraphs, containing UPPER CASE text. Not with Javascript, possibly by Find & Replace or regex, I need prepare (clean) html document for further conversion. — fxgreen, Jul 21 '16 at 13:50

score 0 · Answer 4 · answered Jul 21 '16 at 15:18

0

Using vim Editor you can do it with the following command:

:g/<p>[A-Z ]\{-}<\/p>/ s/\s\{2,}/ /g | s/<p>/<p class="bold">/g

Note that it does not work if your  tag spans multiple lines, like:

<p>
  UPPER  CASE  TEXT
</p>

answered Jul 21 '16 at 15:18

dNitro

5,145
2
20
45

I'm not very familiar with Vim, is it possible make same with Gedit [Advanced Find/Replace](https://s32.postimg.org/3t9oil5g5/plugin2.png)? Also, some paragraphs have text that spans two lines. – fxgreen Jul 21 '16 at 21:21
You does not need any familiarity with Vim, **First backup your files**, then open terminal and `cd` to where your html documents are and run `vim -c 'g/
[A-Z ]\{-}<\/p>/ s/\s\{2,}/ /g | s/
/
/g' -c wq example.html` . Replace *example.html* with your html files.
– dNitro Jul 21 '16 at 21:40
Can no install Vim due to restrictions, vim-tiny is installed, but its not suitable for this. Is there way to make this with Gedit regex? – fxgreen Jul 22 '16 at 18:07

How to apply style to all paragraphs with upper case text?

4 Answers4

Regular expression support in UltraEdit

Character set (code page) depending solutions

Unicode solutions using Perl