I cleaned up some of your code to show you what is happening
private static String setNonHTML(String txt)
{
System.out.println(txt.substring(txt.indexOf("<p style=\"padding-left:") + 23, txt.indexOf("px\"><b>")));
//return "";
//grab the padding text indexes
int beforePaddingIndex = txt.indexOf("<p style=\"padding-left:");
int afterPaddingIndex = txt.indexOf("px\"><b>");
//replace all breaks with new lines
txt = txt.replaceAll("<br>", "\n");
//replaces all instances of 40px\"> with \n\t
txt = txt.replaceAll(txt.substring(beforePaddingIndex, afterPaddingIndex + 7), "\n" + repeat("\t", Integer.parseInt(txt.substring(beforePaddingIndex + 23, afterPaddingIndex)) / 40));
//the indexes of these items have changed because the last operation replaced them. The following items will not have indexes due to the replace operation.
beforePaddingIndex = txt.indexOf("<p style=\"padding-left:");
afterPaddingIndex = txt.indexOf("px\"><b>");
afterPaddingBeforeBoldIndex = txt.indexOf("px\">");
//replace a substring of the same tag a second time? should find nothing
txt = txt.replaceAll(txt.substring(beforePaddingIndex, afterPaddingIndex), "\n" + repeat("\t", Integer.parseInt(txt.substring(beforePaddingIndex + 23, afterPaddingBeforeBoldIndex)) / 40));
txt = txt.replaceAll("(?s)<[^>]*>(\\s*<[^>]*>)*", "\n");
return txt;
}
as you can see, after the first replace all, there is a second replace all that takes place on virtually the same indexes. You grab the index of values inline after the first replace all so I set them again to replicate that behavior. Splitting out code into descriptive variables and sections is a good practice and is monumentally helpful when trying to debug complicated sections. I don't know what the output of your program is giving you, so I have no way to know if this actually solves your issue, but it does look like a bug and I believe this might give you a good start.
As for what you should do to fix this, you may want to look into some off the shelf solution like http://htmlcleaner.sourceforge.net/javause.php
That allows you to traverse and modify html programmatically and read off attributes like padding left and the extract content between tags.