I'm working with Java regular expressions on Android platform.
I'm trying to search this HTML for defined a regular expression.
Here's my code:
public void mainaaForWWW(String websiteSource){
try {
websiteSource = readDataFromWWW(websiteSource);
} catch (IOException e1) {
e1.printStackTrace();
}
ArrayList<String> cinemaArray = new ArrayList<String>();
Pattern sample = Pattern.compile("<div class=\"theatre\">");
Matcher secuence = sample.matcher(websiteSource);
try {
while (secuence.find()) {
cinemaArray.add(secuence.group());
}
} catch (Exception e) {
e.printStackTrace();
}
titleTableForWWW = new String[cinemaArray.size()];
for(int i = 0; i < titleTableForWWW.length; i++)
titleTableForWWW[i] = cinemaArray.get(i);
}
The problem is quite strange, because when I debug the code, String websiteSource
is okay (all HTML files are completely loaded), but there's only 4 while loops. In the HTML document I found manually 11 matches. This regex is simplified only to find what's going on. Any ideas?
Ok, my bad. I found a solution:
So, here's my code responsible for writing HTML
source code to String
:
public String readDataFromWWW(String UrlAdress) throws IOException
{
String line = null;
URL url = new URL(UrlAdress);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream(), "ISO-8859-2"));
while (rd.readLine() != null) {
line += rd.readLine();
}
System.out.println(line);
return line;
I think that reading to string that way, may something messed up, so I replaced this method by this one:
public String readDataFromWWW(String UrlAdress) throws IOException
{
String wyraz = "";
try {
String webPage = UrlAdress;
URL url = new URL(webPage);
URLConnection urlConnection = url.openConnection();
InputStream is = urlConnection.getInputStream();
InputStreamReader isr = new InputStreamReader(is, "ISO-8859-2");
int numCharsRead;
char[] charArray = new char[1024];
StringBuffer sb = new StringBuffer();
while ((numCharsRead = isr.read(charArray)) > 0) {
sb.append(charArray, 0, numCharsRead);
}
wyraz = sb.toString();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return wyraz;
}
And everything works FINE! Thanks a lot for clues and help. I think the problem was connected with newline durring writing String
, but I'm not quite sure.