0

I have a rich text coming in the request which can contain multiple images as base64 string. I need to collect all the images and their corresponding file names. So far I have tried the below code and able to work it out for a single image. How can the below code be improved or is there any efficient way to do this. I do not have much idea on regex so did not try that. Any help is appreciated.

In the below example I have considered only one image is present in the rich text.

public static void main(String[] args) {
        String richText = "<div class=\"se-component se-image-container __se__float-center\" contenteditable=\"false\"><figure style=\"margin: auto; width: 300px;\"><img src=\"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABaAAAANNCAYAAABhqtmuAAAAAXNSR0IArs4c6QAAQABJREFUeAHs3QmcXtP5B/AniyAkEWuIkA0Va621BUG1CLV1sasqitr3/m21Vamllmq1VdqqorbaRYh9X0utScQWJbEFSUX+99yY6WRMZiYyZ7x53+/1ycz73vfe557zvWdmPn5z5rwdIuLy4p+NAAECBAgQIECAAAECBAgQIECAAAECBAi0qUDHNq2mGAECBAgQIECAAAECBAgQIECAAAECBAgQ+FxAAG0oECBAgAABAgQIECBAgAABAgQIECBAgEAWAQF0FlZFCRAgQIAAAQIECBAgQIAAAQIECBAgQEAAbQwQIECAAAECBAgQIECAAAECBAgQIECAQBYBAXQWVkUJECBAgAABAgQIECBAgAABAgQIECBAQABtDBAgQIAAAQIECBAgQIAAAQIECBAgQIBAFgEBdBZWRQkQIECAAAECBAgQIECAAAECBAgQIEBAAG0MECBAgAABAgQIECBAgAABAgQIECBAgEAWAQF0FlZFCRAgQIAAAQIECBAgQIAAAQIECBAgQEAAbQwQIECAAAECBAgQIECAAAECBAgQIECAQBYBAXQWVkUJECBAgAABAgQIECBAgAABAgQIECBAQABtDBAgQIAAAQIECBAgQIAAAQIECBAgQIBAFgEBdBZWRQkQIECAAAECBAgQIECAAAECBAgQIEBAAG0MECBAgAABAgQIECBAgAABAgQIECBAgEAWAQF0FlZFCRAgQIAAAQIECBAgQIAAAQIECBAgQEAAbQwQIECAAAECBAgQIECAAAECBAgQIECAQBYBAXQWVkUJECBAgAABAgQIECBAgAABAgQIECBAQABtDBAgQIAAAQIECBAgQIAAAQIECBAgQIBAFgEBdBZWRQkQIECAAAECBAgQIECAAAECBAgQIEBAAG0MECBAgAABAgQIEngAAAAASUVORK5CYII=\" alt=\"\" data-rotate=\"0\" data-proportion=\"true\" data-align=\"center\" data-size=\"300px,300px\" data-index=\"0\" data-file-name=\"Upload Question.png\" data-file-size=\"38747\" data-origin=\"300px,300px\" style=\"width: 300px; height: 300px;\"></figure>";
        String[] texts = StringUtils.substringsBetween(richText, "<img", "</figure>");
        for (String td : texts) {
            String fileName = StringUtils.substringBetween(td, "data-file-name=\"", "\"");
            System.out.println("fileName:" + fileName); //prints fileName:Upload Question.png
            String base64 = StringUtils.substringBetween(td, ",", "\""); 
            System.out.println(base64);// prints //iVBORw0KGgoAAAANSUhEUgAABaAAAANNCAYAAABhqtmuAAAAAXNSR0IArs4c6QAAQABJREFUeAHs3QmcXtP5B/AniyAkEWuIkA0Va621BUG1CLV1sasqitr3/m21Vamllmq1VdqqorbaRYh9X0utScQWJbEFSUX+99yY6WRMZiYyZ7x53+/1ycz73vfe557zvWdmPn5z5rwdIuLy4p+NAAECBAgQIECAAAECBAgQIECAAAECBAi0qUDHNq2mGAECBAgQIECAAAECBAgQIECAAAECBAgQ+FxAAG0oECBAgAABAgQIECBAgAABAgQIECBAgEAWAQF0FlZFCRAgQIAAAQIECBAgQIAAAQIECBAgQEAAbQwQIECAAAECBAgQIECAAAECBAgQIECAQBYBAXQWVkUJECBAgAABAgQIECBAgAABAgQIECBAQABtDBAgQIAAAQIECBAgQIAAAQIECBAgQIBAFgEBdBZWRQkQIECAAAECBAgQIECAAAECBAgQIEBAAG0MECBAgAABAgQIECBAgAABAgQIECBAgEAWAQF0FlZFCRAgQIAAAQIECBAgQIAAAQIECBAgQEAAbQwQIECAAAECBAgQIECAAAECBAgQIECAQBYBAXQWVkUJECBAgAABAgQIECBAgAABAgQIECBAQABtDBAgQIAAAQIECBAgQIAAAQIECBAgQIBAFgEBdBZWRQkQIECAAAECBAgQIECAAAECBAgQIEBAAG0MECBAgAABAgQIECBAgAABAgQIECBAgEAWAQF0FlZFCRAgQIAAAQIECBAgQIAAAQIECBAgQEAAbQwQIECAAAECBAgQIECAAAECBAgQIECAQBYBAXQWVkUJECBAgAABAgQIECBAgAABAgQIECBAQABtDBAgQIAAAQIECBAgQIAAAQIECBAgQIBAFgEBdBZWRQkQIECAAAECBAgQIECAAAECBAgQIEBAAG0MECBAgAABAgQIEngAAAAASUVORK5CYII=
        }
    }
TheNightsWatch
  • 371
  • 10
  • 26
  • 2
    [I'd use an HTML parser](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). – ggorlen Jun 10 '20 at 17:31
  • Why not use https://jsoup.org/ for filtering, that should get you past the problem of multiple images. – stackguy Jun 10 '20 at 17:40

1 Answers1

0

In Javascript /(?<=base64,).+?(?=\")/gi would solve your problem

Aleks
  • 894
  • 10
  • 14