0

I'm trying to extract list of movies(titles), dateTime and whether the movie is "MX4D-2D" or "VIP" etc... from this website The website is using javascript to dynamically load content, so I used file get contents and strip tags on the website and now i'm left with the javascript in a plaint text format with the list of movies information, it's formatted so:

movieData = {
        '2019-11-16': [
            /*will have to change nowPlaying to have separate dates everywhere */
            {
                'url': 'the-addams-family',
                'image-portrait': 'https://d10u9ygjms7run.cloudfront.net/0009/1573662077853_HO00002023.jpeg',
                'image-landscape': 'https://d10u9ygjms7run.cloudfront.net/0009/1573662079231_h-HO00002023.jpeg',
                'title': 'The Addams Family',
                'releaseDate': '2019-10-17',
                'endpoint': 'HO00002023',
                'duration': '87 mins',
                'rating': 'Rated PG',
                'director': 'Greg Tiernan, Conrad  Vernon',
                'actors': 'Charlize Theron, Oscar Isaac, Chloë Grace  Moretz, Allison Janney, Elsie Fisher, Nick Kroll, Bette Midler, Finn  Wolfhard, Aimee  Garcia',
                'times': [
                        { 'type': '','time': '12:45pm', 'bookingLink': 'https://themoviesticketing.com/ticketing/visSelectTickets.aspx?cinemacode=0009&txtSessionId=41264&visLang=1', 'attributes': [] },
                ]
            },
            {
                'url': 'black-and-blue',
                'image-portrait': 'https://d10u9ygjms7run.cloudfront.net/0009/1573662057611_HO00002024.jpeg',
                'image-landscape': 'https://d10u9ygjms7run.cloudfront.net/0009/1573662058845_h-HO00002024.jpeg',
                'title': 'Black and Blue',
                'releaseDate': '2019-10-24',
                'endpoint': 'HO00002024',
                'duration': '108 mins',
                'rating': 'Rated R',
                'director': 'Deon Taylor',
                'actors': 'Naomie  Harris, Frank Grillo, Tyrese Gibson, Mike Colter, Reid Scott, Beau Knapp, Nafessa Williams',
                'times': [
                        { 'type': '','time': '10:00pm', 'bookingLink': 'https://themoviesticketing.com/ticketing/visSelectTickets.aspx?cinemacode=0009&txtSessionId=41257&visLang=1', 'attributes': [] },
                        { 'type': '','time': '11:15pm', 'bookingLink': 'https://themoviesticketing.com/ticketing/visSelectTickets.aspx?cinemacode=0009&txtSessionId=41229&visLang=1', 'attributes': [] },
                ]
            },

It also contains additional js that I don't need, is there an easy way to remove them and grab only the information i need? My end goal is store this in a db so i can keep track of movies from different cinema's. The full code is here: https://pastebin.com/TA0rfSB8

JonattanD
  • 61
  • 5

2 Answers2

0

You can convert text representing Javascript objects (JSON) in PHP using json_decode

Example:

<?php
$json = '{"foo-bar": 12345}';

$obj = json_decode($json);
print $obj->{'foo-bar'}; // 12345
?>

You need to make sure that your text is in a proper JSON format, removing movieData = at the beginning and using double quotes instead of apostrophes. You can use this tool to validate your JSON string in order to know exactly what to change.

  • 3
    The posted text is not valid JSON. – Federico klez Culloca Nov 15 '19 at 11:37
  • I know, he could attempt to convert it to valid JSON though. It's not the most reliable solution but if that javascript is the only input he has, this is a valid solution – Alessandro Tedesco Nov 15 '19 at 11:46
  • 1
    Yeah, when I posted my initial comment you didn't have the part where you state it's not valid JSON, so I pointed that out. I agree it's a viable (and not the best) solution. – Federico klez Culloca Nov 15 '19 at 11:47
  • 1
    Not really. If he is going to manipulate the text, he is better off using regex to get what he needs. – Martin Dimitrov Nov 15 '19 at 11:48
  • 1
    hmm I guess this is the most viable way to go, the script i'm working on is suppose to be automatic, so i need a way for it to automatically convert js objects to json, so would str_replace be the best way to go about this? – JonattanD Nov 15 '19 at 16:39
0

So all i had to do was convert it to Json using regex to change single quotes to double quotes and format it till it became a valid json and was able to use that.

JonattanD
  • 61
  • 5