0

When we try to acces profiles like linkedIn we get both html and json format text https://www.linkedin.com/in/aaron-jacobs-3b513261/

The useful data i want to grab is in json format, how can i parse it with json.loads(data) neglecting the HTML part

<!DOCTYPE html>
<html lang="en">
  <head>
    <script type="text/javascript" src="https://gc.kis.v2.scr.kaspersky-labs.com/9E1E45EF-3F97-184C-B471-44EF675548EA/main.js" charset="UTF-8"></script><link rel="stylesheet" crossorigin="anonymous" href="https://gc.kis.v2.scr.kaspersky-labs.com/AE845576FE44-174B-C481-79F3-FE54E1E9/abn/main.css"/><script type="application/javascript">!function(i,n){void 0!==i.addEventListener&&void 0!==i.hidden&&(n.liVisibilityChangeListener=function(){i.hidden&&(n.liHasWindowHidden=!0)},i.addEventListener("visibilitychange",n.    liVisibilityChangeListener))}(document,window);</script>

    <meta charset="utf-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <title>LinkedIn</title>
    <meta name="description" content="">
    <meta name="viewport" content="width=device-width, initial-scale=1.0, minimum-scale=1.0">
    <meta name="theme-color" content="#0077B5">

<!---Data is Here--->
<meta name="extended/config/environment" content="%7B%22modulePrefix%22%3A%22extended%22%2C%22environment%22%3A%22production%22%2C%22datawarm%22%3A%7B%22enabled%22%3Atrue%7D%2C%22lix%22%3A%7B%22tests%22%3A%5B%22voyager.web.feed.authorDisabledComments%22%2C%22voyager.web.feed.badge%22%2C%22voyager.web.feed.channelUpdates%22%2C%22voyager.web.feed.comment-article%22%2C%22voyager.web.feed.comment-image%22%2C%22voyager.web.feed.connectionUpdates%22%2C%22voyager.web.feed.deleteReportModal%22%2C%22voyager.web.feed.feedBadgeCountInTotal%22%2C%22voyager.web.feed.followRecommendationUpdates%22%2C%22voyager.web.feed.lyndaUpdates%22%2C%22voyager.web.feed.index.disable-top-text-ad%22%2C%22voyager.web.feed.mentionedInNewsUpdate%22%2C%22voyager.web.feed.promos%22%2C%22voyager.web.feed.updateIndicatorThreshold%22%2C%22voyager.feed.web.share-via-control-panel%22%2C%22voyager.feed.web.dropshadow-on-article-box%22%2C%22voyager.feed.web.hoverable-link-text%22%2C%22voyager.feed.web.enable-sort-toggle%22%2C%22voyager.feed.web.disable-identity-module-stickiness%22%2C%22voyager.feed.client.photo-upload%22%2C%22voyager.feed.client.su-actor-subheadline%22%2C%22voyager.feed.client.su-commentary%22%2C%22voyager.feed.client.su-follow-button%22%2C%22voyager.feed.client.su-highlight-comment-see-more%22%2C%22voyager.feed.client.su-ad-choice%22%2C%22voyager.feed.client.update.cp.enabled%22%2C%22voyager.feed.client.update.cp.report.enabled%22%2C%22voyager.feed.like-on-comment%22%2C%22voyager.feed.reply-on-comment%22%2C%22voyager.web.feed.enable-share-as-message%22%2C%22voyager.me.web.content_analytics_feed_entry_shares%22%2C%22voyager.feed.web.full-width-images%22%2C%22voyager.feed.web.max-small-image-width%22%2C%22neptune.feed.web.max-small-image-width%22%2C%22publishin%22%2C%22voyager.feed.web.sharing.twitter-visibility%22%2C%22voyager.feed.web.sharing.hide-url-input%22%2C%22voyager.feed.web.extended.sharing.subaction-bar-rounded-button-theme%22%2C%22voyager.feed.web.sharing.increase-char-limit%22%2C    %22voyager.web.feed.sponsoredUpdateTracking%22%2C%22voyager.feed.client.hashtags%22%2C%22voyager.search.client.vertical-nav%22%2C%22voyager.search.web.postsVertical%22%2C%22voyager.search.web.right-rail-news-module%22%2C%22voyager.feed.web.rich-media.hide-reshare-button%22%2C%22voyager.web.feed.rmv.hideDetailAndActions%22%2C%22neptune.jobs.enabledNeptune%22%2C%22voyager.web.feed.occlusion-culling%22%2C%22voyager.web.prefetch-lazy-images%22%2C%22voyager.web.feed.editors-pick%22%2C%22voyager.web.feed.right-rail.follow-recommendations%22%2C%22voyager.web.feed.use-composition%22%2C%22voyager.web.feed.follow-page%22%2C%22voyager.web.feed.initial-fetch-update-count%22%2C%22voyager.feed.video.expand.support%22%2C%22voyager.feed.video.autoplay.support%22%2C%22voyager.feed.video.heartbeat.interval%22%2C%22voyager.feed.web.video-upload%22%2C%22voyager.feed.web.video-upload.duration-limit%22%2C%22voyager.feed.web.hide-comments-initially%22%2C%22voyager.web.feed.updateIndicatorThreshold%22%2C%22voyager.web.feed.nup%22%2C%22voyager.feed.video.provider.linkedin%22%2C%22voyager.feed.video.provider.slideshare%22%2C%22voyager.feed.video.provider.vimeo%22%2C%22voyager.feed.video.provider.youtube%22%2C%22voyager.feed.web.likers-modal.additional-paging-request%22%2C%22voyager.sharing.web.remember-visibility-settings%22%2C%22voyager.web.sharing.keep-post-button-active%22%2C%22voyager.web.feed.fie.visibleHeight%22%2C%22voyager.web.feed.perf.layered-rendering%22%2C%22voyager.web.feed.improve-feed-via-control-menu%22%2C%22voyager.jobs.web.deferJymbii%22%2C%22postal_code_location_typeahead_jserp%22%2C%22voyager.premium.web.jobs-fastGrowingCompaniesUpsell%22%2C%22voyager.search.jobs-search.web.create-search-alert-hovercard-enabled%22%2C%22voyager.premium.web.jobPosterUpsell%22%2C%22neptune.launchpad.gate%22%2C%22neptune.launchpad.one-step-flow%22%2C%22voyager.messaging.client.draft-leave-prompt%22%2C%22voyager.messaging.client.forwarding%22%2C%22voyager.messaging.client.enable-group-topcard-facepile%22%2C%22voyager.messaging.client.enable-image-gif-virus-scan%22%2C%22voyager.messaging.client.enable-image-unrolling%22%2C%22voyager.messaging.client.enable-image-virus-scan%22%2C%22voyager.messaging.client.enable-impression-tracking%22%2C%22voyager.messaging.client.enable-leave-web%22%2C%22voyager.messaging.client.enable-lss-unsubscribe%22%2C%22voyager.messaging.client.enable-member-actions-web%22%2C%22voyager.messaging.clien  
hacke john
  • 49
  • 1
  • 1
  • 5
  • It's seem like you need to decode it first. Check https://stackoverflow.com/questions/16566069/url-decode-utf-8-in-python – VMRuiz May 23 '17 at 10:50
  • I'm not making any connections just reading and extracting from a file. – hacke john May 23 '17 at 11:07
  • I'm not talking about connections, the decoding its for converting `%7B%22modulePrefix%22%3A%22extended` into `{"modulePrefix":"extended`so it can be parsed by the json module. – VMRuiz May 23 '17 at 11:15
  • can parser would do the job ? – hacke john May 23 '17 at 11:18
  • No, JSON parser won't decode URL encoded strings and I don't fully understand what is your question – VMRuiz May 23 '17 at 11:26
  • I want to parse it with json, json is not parsing the file because it has both HTML tags and json data objects also. – hacke john May 23 '17 at 11:31
  • I thought you have already resolved the HTML parsing problem. You need to use https://www.crummy.com/software/BeautifulSoup/bs4/doc/# to extract the json content from the HTML code. – VMRuiz May 23 '17 at 12:56
  • Yeah i did html parsing with bs4, but still json parsing is not working. – hacke john May 23 '17 at 17:37

0 Answers0