-2
//Web Scrape user comments by php curl and store in the mysql. Is it poossible?? 
<?php
$url = 'https://www.flipkart.com/samsung-galaxy-on5-gold-8-gb/product-reviews/itmedhx3uy3qsfks?pid=MOBECCA5FHQD43KA';
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL,$url);
curl_setopt($ch, CURLOPT_USERAGENT, 'Opera/9.23 (Windows NT 5.1; U; en)');
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST,  2);
$result = curl_exec($ch);
//echo $result;
preg_match_all ('/<div class="_3DCdKt">([^`]*?)<\/div>/', $result, 
$matches);
echo sizeof($matches);
echo sprintf(print_r($matches, true));
?>

Error: get blank page of flipkart.

Any help much appreciated.

Thanks In Advance

  • The page is loading, your `echo $result;` shows that, the problem is that the data your after is probably being loaded by Javascript after the page is loaded. You can check this by viewing the source of the page and searching for the class name you have. – Nigel Ren Jun 19 '17 at 17:44
  • nope, the class name is not in the page at all. i have traced down the comment fetch url to https://www.flipkart.com/api/3/page/dynamic/product-reviews though. – hanshenrik Jun 19 '17 at 17:47
  • on page load, review section not display. only its head part and footer traced. and i doing same in product page.its scrape all the data. – Harsh Pandya Jun 20 '17 at 07:08

1 Answers1

1

please read this https://stackoverflow.com/a/1732454/1067003 and come back. read it? good, now here's a decent alternative: DOMDocument. now you've solved your first problem, trying to parse HTML with regex.

your second problem, is that this website does not serve the reviews themselves in the html, but rather fetches the reviews via javascript. the obvious course of action then is to start digging into the internal api the javascript use to fetch the comments. do that, and you'll probably find the url https://www.flipkart.com/api/3/page/dynamic/product-reviews , where the javascript is fetching the comments from. however, replicating the the javascript request with curl, you'll probably get a 403 Forbidden error, until you figure out that you must fake the custom x-user-agent header to match whatever you put in CURLOPT_USERAGENT. doing that, you'll probably end up with something like this:

<?php
declare(strict_types = 1);
$url = 'https://www.flipkart.com/samsung-galaxy-on5-gold-8-gb/product-reviews/itmedhx3uy3qsfks?pid=MOBECCA5FHQD43KA';
parse_str ( parse_url ( $url, PHP_URL_QUERY ), $productID );
$productID = $productID ['pid'];
$ch = curl_init ( '' );
$useragent = 'Opera/9.23 (Windows NT 5.1; U; en)';
curl_setopt_array ( $ch, array (
        CURLOPT_USERAGENT => $useragent,
        CURLOPT_URL => 'https://www.flipkart.com/api/3/page/dynamic/product-reviews',
        CURLOPT_POST => true,
        CURLOPT_HTTPHEADER => array (
                'Content-Type: application/json',
                'x-user-agent: ' . $useragent . ' FKUA/website/41/website/Desktop' 
        ),
        CURLOPT_POSTFIELDS => json_encode ( array (
                'requestContext' => array (
                        'productId' => $productID 
                ) 
        ) ),
        CURLOPT_RETURNTRANSFER => true 
) );
$json = curl_exec ( $ch );
$parsed = json_decode ( $json, true );
$comments = $parsed ['RESPONSE'] ['data'] ['product_review_page_default_1'] ['data'];
foreach ( $comments as $comment ) {
    $comment = $comment ['value'] ['text'];
    var_dump ( $comment );
}

which outputs:

string(157) "I have bought it in ₹6,291.00
Very good product in this price.
I have updated it to Marshmallow.

Only drawback is there is no Notification LED in the set."
string(422) "More than Worth to buy .
Simply super With 1.5GB RAM
Processor is Excellent
RAM is No hanging up & No heating
Replaceble battery
Wonderful clicks with Excellent back and front camera
4G is Awesome
Awesome touchscreen
Battery is backup also great  
Excellent UI
Additional features
1. Power saving ..
2. Data restrict. 
Both are excellent and useful to batter backup
Final verdict, I will recommend this phone to my buddies"
string(189) "I Bought this phone yesterday.. 
Positive Points
1.Camera Quality is Good
2.HD Display
3.Slim & Good Design
4. 4G Supports
5.Latest Android Version

Negative Points
1. Less Internal Storage"
string(261) "Under 9k this buy is good
you get a handset with good display and battery backup
And it is good for day to day usage
your cant play heavy games on it due to 1 gb Ram

For Android User this is the cheapest and the best phone which Samsung can offer

So go for it"
string(2515) "This post is long:) I will not be repeating the features which others have already liked (e.g. the display, camera, gaming speed, storage etc...) All these are good and work well as expected from Samsung phones. 

The phone does not look cheap from any angle. Some have even confused this with the Galaxy A5, so this says a lot! 

First, with our Indian Air force issuing some statement in TOI about Chinese phones snooping in on our data, i was reluctant to buy any of these phones mentioned below.
Yu, MI, Huawei, Lenovo, Gionee and Micromax (yes, they import these from China)) 
This phone ticks all the right boxes with Samsung reliability. 

Second, glad to see this phone being manufactured in India as part of 'Make in India' campaign. Way to go Samsung! The South Korean manufacturer and in general the South Korean industry has grown into a superpower. (LG, Samsung, Hyundai are all shining examples of craftsmanship and quality) 

The Gold color looks classy. Cover her up with a matching authentic gold flip color from Samsung and you have a phone which turns heads. What more! you get a Samsung S5 design of last year, which is good, considering the bezel lines, chrome touch and minimalist look to this phone. 

Display is great and does not make you feel that you lose out on the Amoled display. So full marks here. Outdoor visibility can be tuned manually. 

Speaker quality is good and for those who are complaining, there is an 'Extra volume' option when answering calls. There is also a 'Sound Adapt' feature to tune the phone to your hearing needs. 

The phone also comes with a 'Smart Manager' to take care of RAM, storage, battery and device security. For enhanced security, you can register your mobile on samsung find my mobile and enable remote tracking on. This will help you to locate your device using GPS. Though i'm yet to use this feature (hopefully not:), the remote tracking feature can also wipe or lock your phone remotely, in case of theft. 

The usb charger seems to be fragile, so you need to be careful while plugging in. 

Have also discovered the FM Radio service which is absent in now-a-days phone. This is other than the 'MixRadio' feature which does online streaming. So like old days, you can listen to your favourite FM channel on the go without a 3G or 4G connection. Wow! i now can listen to my favourite songs and can record these too on my device. 

So go for it. It is definitely worth every rupiah (penny)

PS: if you found this review useful, give me a thumbs up"
string(1124) "Hello friends, I am writing this review after 2 days of usage. I always compare/read reviews before buying any products. 
Pros: 

*It arrives in style via ekart.
*Flipkart packing is very nice and protective.
*Weight of phone including battery is quiet light.
*Look and feel is very much attractive, my colleagues compared with their expensive samsung phones, this phone also looks of same segment.
*Fine details on edge looks posh, it gets broader at the bottom, overall it is great. 
*UI and 5.1 lollipop is simple and very organised.
*Music setting has a feature called Adapt sound, it enable every individual to fine tune the amount of sound for both ears, so that it suites you best without hurting your eardrums. The earphone which come is bad for music lovers, try getting your best brand's headphone, I use sony xb400 (old one,that time price was aprx 2K) it bounces nice. ;)
* Apart from this every feature is nicely blended to cooperate daily life activities. 
*There is no heating issue yet.
*This phone is definitely value for money product. 

Cons: 

*No notification lights.
*No themes
*Less default wallpapers"
string(77) "Everthing os good and fine...thank u flipkart...mobile is so good and nice..."
string(818) "Initial Review:

What I liked:
+ SAR Value of 0.551 W/Kg. 
+ 5" large screen looks great inspite of TFT display.
+ Excellent Camera - Front and Back.
+ Decent performance, no lag noticed so far.
+ Android 5.1
+ Good build quality with premium finish.
+ No bloatware.

What I didn't like:
- Average Speaker. Wish it was louder, crisper and in the front.
- No Notification light. Why Samsung why?
- No Ambient Light Sensor.
- A little expensive considering the Redmi 2 is available for Rs. 5999.

If you are big on music, the speaker quality will disappoint you. No excuse for not adding notification light and better quality speaker for the price. 

Redmi 2 is a superior VFM phone, but since I as buying it for my father - the low SAR value was a big selling point including the big screen and the Samsung brand value."
string(249) "Can give good competition to other phone makers. Amazing fast, good features, and the biggest brand name "SAMSUNG".

Prefer buying Samsung over any other Chinese phones, because at this price it can beat any other phone and woo factor also included."
string(203) "Good product...delivered on 21.10.16 in 2 days. Camera, battery backup everything is ok. No heating,hanging problem found. 550 mb RAM was accessible. Even after upgrading to MARSHMALLOW no problem found."

EDIT: didnt paste the whole code first time around x.x

hanshenrik
  • 19,904
  • 4
  • 43
  • 89
  • this is 404 page: [link](https://www.flipkart.com/api/3/page/dynamic/product-reviews). i didn't understand about this. can you explain. or the above code not working. getting warning error. – Harsh Pandya Jun 20 '17 at 07:03
  • Yes its working. Thanks a lot for helping me. if i want it with pagination so what will i do?? and how to scrape whole div with username star and date?? – Harsh Pandya Jun 20 '17 at 07:21
  • @HarshPandya usernames and stars and date and all that stuff, is in the $parsed array :) `var_dump($parsed);` – hanshenrik Jun 20 '17 at 11:55
  • @HarshPandya and yeah, it says 403 (or 404?) if its not a POST request with the appropriate headers and POST content data – hanshenrik Jun 20 '17 at 11:56
  • you are right, when `var_dump($parsed);` its show all the data, but how to show it like differently. Like `['product_review_page_default_1'] ['data']` for only reviews. how to show same in stars and username??? AND _Thanks a lot_ for helping me. – Harsh Pandya Jun 22 '17 at 06:26
  • @HarshPandya sigh, are you really clueless having no idea how to find them yourself, or are you just lazy? ```$parsed = json_decode ( $json, true ); $commentsData = $parsed ['RESPONSE'] ['data'] ['product_review_page_default_1'] ['data']; foreach ( $commentsData as $commentData ) { $commentAuthor = $commentData ['value'] ['author']; $commentStars = $commentData ['value'] ['rating']; $comment = $commentData ['value'] ['text']; var_dump ( $commentAuthor, $commentStars, $comment ); } ``` – hanshenrik Jun 22 '17 at 08:42
  • some times we didn't understand how to find simple, that's why people asking, but it's ok.. after your advice, i will focus in it. but you are intelligent, and thank you so much, i appreciated. :) – Harsh Pandya Jun 22 '17 at 11:01
  • @hanshenik,Any Idea Same For Snapdeal & Amazon?? Is They Provide Any Api?? – Harsh Pandya Jun 30 '17 at 11:10
  • @HarshPandya idk about Snapdeal, but i did write a scraper for Amazon some time back, scraping Amazon reviews was easy (and the comments were embedded in the HTML) – hanshenrik Jun 30 '17 at 11:29
  • @henshenrik,So please Share the Amazon Scrapper For understanding – Harsh Pandya Jun 30 '17 at 11:59
  • @HarshPandya i don't have *that* code anymore, but it looked much like this https://gist.github.com/divinity76/83d0aaa1cba3a801b1d97600f4ae7f4a – hanshenrik Jun 30 '17 at 15:44
  • its not worked for me. & thank you @hanshenrik for your helping. – Harsh Pandya Jul 04 '17 at 06:57
  • @HarshPandya didn't work how? do you get any error logs? it works fine here, output: https://paste.ratma.net/p/149 – hanshenrik Jul 04 '17 at 09:52
  • hhb_.inc.php got error (Parse error: syntax error, unexpected ':', expecting '{' in hhb_.inc.php on line 9) and page show blank output – Harsh Pandya Jul 04 '17 at 11:03
  • @HarshPandya oh, that's because you're running PHP 5x. this code is written in PHP7 – hanshenrik Jul 04 '17 at 11:40
  • @HarshPandya there is a PHP5 version at https://github.com/divinity76/hhb_.inc.php/blob/master/hhb_.inc.php5.php , though – hanshenrik Jul 04 '17 at 11:43
  • its worked.. thanks @hanshenrik.. and in sanpdeal star review, its comes up with click ajax event. is there way to solve?? – Harsh Pandya Jul 05 '17 at 05:19
  • @HarshPandya yup, much like i explained in the original post, check how the browser & javascript is fetching the full comments, and replicate those requests with curl – hanshenrik Jul 05 '17 at 09:51
  • @hanshenrik, thanks. It worked. But how to change `page number` through this – Rohan Khude Feb 27 '18 at 14:05
  • @HarshPandya did you manage to scrap product image from flipkart? – Sachin Oct 12 '19 at 06:35
  • @Sachin i never tried, but a quick look reveals that the image urls are in json inside the html under the ` – hanshenrik Oct 12 '19 at 12:58