1

There was some XML parsed text that looked like this:

06:00 Vesti<br>07:15 Something Else<br>09:10 Movie<a href="..."> ... <br>15:45 Something..

and there was a lot of it..

Well, I have done this:

String mim =ses.replaceAll("(?s)\\<.*?\\>", " \n");

there was no other way to show text nicely. Now, after few showings, and some time, I need that same text separated into alone strings like this:

06:00 Vesti   

... or

07:15 Something Else

I've tried something like this, but it does not work:

char[] rast = description.toCharArray();
    int brojac = 0;
    for(int q=0; q<description.length(); q++){
        if(rast[q]=='\\' && rast[q+1]=='n' ) brojac++;
    }
    String[] niz = new String[brojac];

    int bf1=0;
    int bf2=0;
    int bf3=0;
    int oo=0;

    for(int q=0; q<description.length(); q++){
        if(rast[q]=='\\'&& rast[q+1]=='n'){
            bf3=bf1;
            bf1=q;

            String lol = description.substring(bf3, bf1);
            niz[oo]=lol;
            oo++;
        }
    }

I know that in description.substring(bf3,bf1) are not set as they should be but I think that this:

if(rast[q]=='\\' && rast[q+1]=='n) 

does not work that way.. is there any other solution?

Note. there is no other way to get that resource. , It must be through this.

Igx33
  • 171
  • 1
  • 2
  • 10

2 Answers2

1

Calling Html.fromHtml(String) will properly translate the <br> into \n.

String html = "06:00 Vesti<br>07:15 Something Else<br>09:10 Movie<a href=\"...\"> ... <br>15:45 Something..";
String str = Html.fromHtml(html).toString();
String[] arr = str.split("\n");

Then, just split it on a line basis - no need for regexps (which you shouldn't be using to parse HTML in the first case).

Edit: Turning everything into a bunch of Dates

// Used to find the HH:mm, in case the input is wonky
Pattern p = Pattern.compile("([0-2][0-9]:[0-5][0-9])");
SimpleDateFormat fmt = new SimpleDateFormat("HH:mm");
SortedMap<Date, String> programs = new TreeMap<Date, String>();
for (String row : arr) {
    Matcher m = p.matcher(row);
    if (m.find()) {
        // We found a time in this row
        ParsePosition pp = new ParsePosition(m.start(0));
        Date when = fmt.parse(row, pp);
        String title = row.substring(pp.getIndex()).trim();
        programs.put(when, title);
    }
}
// Now programs contain the sorted list of programs. Unfortunately, since
// SimpleDateFormat is stupid, they're all placed back in 1970 :-D.
// This would give you an ordered printout of all programs *AFTER* 08:00
Date filter = fmt.parse("08:00");
SortedMap<Date, String> after0800 = programs.tailMap(filter);
// Since this is a SortedMap, after0800.values() will return the program names in order.
// You can also iterate over each entry like so:
for (Map.Entry<Date,String> program : after0800.entrySet()) {
    // You can use the SimpleDateFormat to pretty-print the HH:mm again.
    System.out.println("When:" + fmt.format(program.getKey()));
    System.out.println("Title:" + program.getValue());            
}
Jens
  • 16,853
  • 4
  • 55
  • 52
  • did not use that HTML.fromHTML(String), but I've used SPLIT, works like a charm now... ty – Igx33 Aug 16 '12 at 13:47
  • Html.fromHtml(..) is good way to reliably remove HTML from input (moreso than regexp as it uses an actual HTML parser (tagsoup usually) to strip it). – Jens Aug 16 '12 at 13:51
  • you know what, When I used that HTML.fromHtml(String) thing... It works even better then before.. dude, you are a life saviour.. ty so much!! – Igx33 Aug 16 '12 at 13:53
  • By, the way.. now I know.. this is off topic, but you seam like a real pro. How would you suggest me to now check which one is currently showed on TV. You know, this is basic TV-Guide. I though of getting the time now, and then to split that string into: Hours(int), Minutes(int) and Rest of it(String), and then to compare it somehow and to see which one is the most close, but not higher then current time... – Igx33 Aug 16 '12 at 13:57
0

Use regex:

List<String> results = new ArrayList<String>(); 

Pattern pattern = Pattern.compile("(\d+:\d+ \w+)<?");
Matcher matcher = pattern.matcher("06:00 Vesti<br>07:15 Something Else<br>09:10 Movie<a href="..."> ... <br>15:45 Something..");

while(matcher.find()) {
    results.add(matcher.group(0));
}

results will end up as a list of strings:

results = List[
    "06:00 Vesti", 
    "07:15 Something Else", 
    "09:10 Movie", 
    "15:45 Something.."]

See Rexgex Java Tutorial for an idea of how javas regex library works.

Michael Allen
  • 5,712
  • 3
  • 38
  • 63
  • Umm, sorry, you meant this for working with just parsed String or string that I've called String mim =ses.replaceAll("(?s)\\<.*?\\>", " \n"); ??? I don't understand much of that, but I only need that done on that string mim, which has all those HTML signs replaced with \n .. just need to get rid of \n and to set that text into array or list of strings like you did... – Igx33 Aug 16 '12 at 13:31