0

I would like to fetch specific data in JSON Data : every links in href in this markup <div id='gallery-1'

For example with my JSON Data :

<p><strong style=\"font-size: 13px;\">22nd March</strong></p>\n
<p>Swell is 3 foot and clean but wind swing south west later. Get on the early</p>\n
<p><span id=\"more-113\"></span></p>\n
<p>High tide: 1922 2.6m    <span style=\"color: #ff0000;\"> <a href=\"http://www.bundoransurfco.com/webcam/\">
<strong>CLICK HERE FOR LIVE PEAK WEBCAM</strong></a></span></p>\n
<p>Low Tide: 1249 -0.1m</p>\n<p><b>3 day forecast to March 23rd</b></p>\n
<p>Looks like a fun few days with light winds and a long period swell.</p>\n\n\t\t
<style type='text/css'>\n\t\t\t#gallery-1 {\n\t\t\t\tmargin: auto;\n\t\t\t}\n\t\t\t
#gallery-1 .gallery-item {\n\t\t\t\tfloat: left;\n\t\t\t\tmargin-top: 10px;\n\t\t\t\t
text-align: center;\n\t\t\t\twidth: 50%;\n\t\t\t}\n\t\t\t#gallery-1 img {\n\t\t\t\t
border: 2px solid #cfcfcf;\n\t\t\t}\n\t\t\t
#gallery-1 .gallery-caption {\n\t\t\t\t
margin-left: 0;\n\t\t\t}\n\t\t\t
/* see gallery_shortcode() in wp-includes/media.php */\n\t\t</style>\n\t\t
<div id='gallery-1' class='gallery galleryid-113 gallery-columns-2 gallery-size-medium'>
<dl class='gallery-item'>\n\t\t\t<dt class='gallery-icon portrait'>\n\t\t\t\t
<a rel=\"prettyPhoto[gallery-113]\" href='http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n.jpg'>
<img width=\"225\" height=\"300\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n-225x300.jpg\" 
class=\"attachment-medium colorbox-113 \" alt=\"10411096_10152611456607000_886839954460588268_n\" /></a>\n\t\t\t
</dt></dl>\n\t\t\t
<br style='clear: both' />\n\t\t</div>\n\n
<p><a href=\"http://www.bundoransurfco.com/webcam/\"> </a></p>\n
<h1> Wind Charts</h1>\n<p><a href=\"http://www.windguru.cz/int/index.php?sc=103244\">
<img class=\"size-thumbnail wp-image-747 alignleft\" title=\"wind guru\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/wind-guru-67x68.jpg\" alt=\"\" width=\"67\" height=\"68\" /></a> <a href=\"http://www.xcweather.co.uk/\"><img class=\"alignnone size-thumbnail wp-image-749\" title=\"xcweathersmall\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/xcweathersmall2-67x68.jpg\" alt=\"\" width=\"67\" height=\"68\" /></a>       <a href=\"http://www.buoyweather.com/wxnav6.jsp?region=UK&program=nww3BW1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e\"><img class=\"alignnone size-thumbnail wp-image-750\" title=\"buoy weather\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/buoy-weather-67x68.jpg\" alt=\"\" width=\"67\" height=\"68\" /></a> <a href=\"http://www.windguru.cz/int/index.php?sc=103244\">Wind Guru</a>       <a href=\"http://www.xcweather.co.uk/\">XC Weather</a>       <a href=\"http://www.buoyweather.com/wxnav6.jsp?region=UK&program=nww3BW1&grb=nww3&latitude=55.0&longitude=-8.75&zone=0&units=e\">Buoy Weather</a></p>\n<h1>Swell Charts</h1>\n<p><a href=\"http://magicseaweed.com/Bundoran-Surf-Report/50/\"><img class=\"alignnone size-thumbnail wp-image-753\" title=\"msw logo\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/msw-logo-67x43.jpg\" alt=\"\" width=\"75\" height=\"43\" /></a>             <a href=\"http://magicseaweed.com/UK-Ireland-MSW-Surf-Charts/1/\"><img class=\"alignnone size-thumbnail wp-image-754\" title=\"magicseaweedwamchart\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/magicseaweedwamchart1-67x68.png\" alt=\"\" width=\"67\" height=\"68\" /></a>       <a href=\"http://www.marine.ie/Home/site-area/data-services/marine-forecasts/wave-forecasts\"><img class=\"alignnone wp-image-755 size-thumbnail\" title=\"marine institute irish bouy data\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/marine-institute-irish-bouy-data-67x42.jpg\" alt=\"\" width=\"67\" height=\"42\" /></a>                 <a href=\"http://magicseaweed.com/Bundoran-Surf-Report/50/\">Magic Seaweed</a>      <a href=\"http://magicseaweed.com/UK-Ireland-MSW-Surf-Charts/1/\">MSM WAM</a>          <a href=\"http://www.marine.ie/Home/site-area/data-services/marine-forecasts/wave-forecasts\">Marine Institute</a></p>\n<h1>Pressure, Weather, Tides</h1>\n<p><a href=\"http://news.bbc.co.uk/weather/forecast/13000\"><img class=\"alignnone size-thumbnail wp-image-756\" title=\"bbc pressure\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/bbc-pressure-67x68.jpg\" alt=\"\" width=\"67\" height=\"68\" /></a>          <a href=\"http://www.met.ie/\"><img class=\"alignnone size-thumbnail wp-image-759\" title=\"met eireann\" src=\"http://www.bundoransurfco.com/wp-content/uploads/2010/12/met-eireann-67x68.jpg\" alt=\"\" width=\"67\" height=\"68\" /></a>            <a href=\"http://news.bbc.co.uk/weather/forecast/13000\">BBC Pressure</a>      <a href=\"http://www.met.ie/\">Met Eireann</a>      <a href=\"http://www.irishtimes.com/weather/tides.html\">Irish Tide Tables</a></p>\n

Fetch only : http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n.jpg

I can fetch every <a> containing href (NSLog(@"%@", url);), my NSPredicate doesn't work, but I really need only href in <div id='gallery-1' ...

Here is my code :

#pragma mark - Regex <a href=http://.........>
        NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:@"<a[^>]*>" options:NSRegularExpressionCaseInsensitive error:nil];
        NSArray *arrayOfAllMatches = [regex matchesInString:stringBDD options:0 range:NSMakeRange(0, [stringBDD length])];

        NSMutableArray *arrayOfURLs = [[NSMutableArray alloc] init];
        for (NSTextCheckingResult *match in arrayOfAllMatches) {
            NSString* substringForMatch = [stringBDD substringWithRange:match.range];
            [arrayOfURLs addObject:substringForMatch];
        }

#pragma mark - NSPredicate
        NSArray *url = [NSArray arrayWithArray:arrayOfURLs];
        NSLog(@"%@", url);
        NSPredicate *predicate = [NSPredicate predicateWithFormat:@"SELF beginswith[c] %@", @"<a href='http://www.bundoransurfco.com/wp-content/uploads/"];
        NSArray *arrayPictures = [url filteredArrayUsingPredicate:predicate];
        NSLog(@"%@", arrayPictures);

#pragma mark - Count number of pictures find
        NSUInteger count = 0, length = [stringBDD length];
        NSRange range = NSMakeRange(0, length);
        while(range.location != NSNotFound)
        {
            range = [stringBDD rangeOfString: @"<a href='http://www.bundoransurfco.com/wp-content/uploads/" options:0 range:range];
            if(range.location != NSNotFound)
            {
                range = NSMakeRange(range.location + range.length, length - (range.location + range.length));
                count++;
            }
        }
        NSLog(@"%zd", count);

UPDATE :

NSURL *url = [NSURL URLWithString:@"http://www.bundoransurfco.com/surf-report/surf-report/?json=1"];
if (url)
{
    NSData * urlDataToParse = [NSData dataWithContentsOfURL:url];
    TFHpple * parser = [TFHpple hppleWithHTMLData:urlDataToParse];
    NSArray * ahrefNodes = [parser searchWithXPathQuery:@"//a[@href]"]; //array of all <a href> and <iframe>
    NSLog(@"%@", ahrefNodes);
}

That gives me a lot of content, and not just <a href > ...

NSURL *url = [NSURL URLWithString:@"http://www.bundoransurfco.com/surf-report/surf-report/?json=1"];
if (url)
{
    NSData * urlDataToParse = [NSData dataWithContentsOfURL:url];
    TFHpple * parser = [TFHpple hppleWithHTMLData:urlDataToParse];
    NSArray * ahrefNodes = [parser searchWithXPathQuery:@"//div[@id='gallery-1']"]; //array of all <a href> and <iframe>
    NSLog(@"%@", ahrefNodes);
}

Doesn't work :/

Vjardel
  • 1,065
  • 1
  • 13
  • 28
  • Well, what does your JSON data look like? You've just included a stream of HTML with excess of whitespace characters and escape \'s. The URL of the website you list isn't responding at all at the moment, so I cannot check myself. But even so, copy/pasting what you provided yields results... messy results, because of the whitespace and escape \'s. See an example I outputted: http://pastebin.com/uukbpxwb – Louis Tur Mar 24 '15 at 00:14

1 Answers1

0

You can make this a lot easier on yourself by using Hpple. It can parse HTML based on XPath convention (because apparently you shouldn't parse HTML directly).

For example, I have a method that checks for a valid URL. And following that I query the content using Hpple for all <a> and <iframe> tags, specifying that I only want the values of the href and src properties of those tags:

if (url)
{
     NSData * urlDataToParse = [NSData dataWithContentsOfURL:url];
     TFHpple * parser = [TFHpple hppleWithHTMLData:urlDataToParse];
     NSArray * ahrefNodes = [parser searchWithXPathQuery:@"//a[@href]|//iframe[@src]"]; //array of all <a href> and <iframe>
}

The NSArray that is returned are the values that follow <a href=... and <iframe src=.... Id check out the repo I linked for specifics, but your search query seems like it would just look for @"//div[@id]". You may also find this link to be helpful.

Some Updates:

Use the XPath query I listed (@"//div[@id]"), specifying the name of the id tag you want isn't going to work because of all the extra escape sequences you have.

I got full results using the following:

- (void)viewDidLoad {
    [super viewDidLoad];

    NSURL * sampleDataURL = [[NSBundle mainBundle] URLForResource:@"surfSample" withExtension:@"json"]; // I created this by just copy/pasting your provided "json" into a file and giving it a .json extension
    NSData * sampleData = [NSData dataWithContentsOfURL:sampleDataURL];

    TFHpple * dataParser = [TFHpple hppleWithData:sampleData isXML:NO];
    NSArray * nodes = [dataParser searchWithXPathQuery:@"//div[@id]"];

    for (TFHppleElement * elementNode in nodes) {
        NSLog(@"The element: %@", elementNode);
    }
}

As I write in my comment above, the results are very messy. See the pastebin: http://pastebin.com/DpfDKcmp

Though, cleaning a string of escape characters is far easier than using regex and predicates, so you should have an easier time accomplishing your goal. (Meaning, either get the content as actual JSON or convert it all into a string and use something like stringByReplacingOccurrencesOfString:withString:)

Community
  • 1
  • 1
Louis Tur
  • 1,303
  • 10
  • 16
  • Thank you for your reply, I tested Hpple like you said, but I've really too contents ... The url I use is from website in JSON Data : http://www.bundoransurfco.com/surf-report/surf-report/?json=1 , can you help me please man ? – Vjardel Mar 23 '15 at 22:23
  • I've tested @"//div[@id='gallery-1']" but that doesn't work :/ – Vjardel Mar 23 '15 at 22:34
  • and from what I can understand, you're only interested in getting back `http://www.bundoransurfco.com/wp-content/uploads/2014/11/10411096_10152611456607000_886839954460588268_n.jpg` ? – Louis Tur Mar 23 '15 at 22:47
  • Updated, yes it's that ! but in div id='gallery-1' it can be many href with jpg, and I would like to fetch every href – Vjardel Mar 23 '15 at 22:49
  • nice ! But my JSON Data is from the URL : http://www.bundoransurfco.com/surf-report/surf-report/?json=1 directly, and they are pretty much data than in my NSDictionnary filtered with objectAtIndex etc, so your code doesn't work in. :/ – Vjardel Mar 24 '15 at 10:13