0

how can i extract phrases that begin with phone and finish with '}' with regex and python

i tried to extract data from a page source. this

{"meta":{"subtitle":"Apartment for Rent in Marina Gate 1, Marina Gate","price":145000,"price_text":"145,000 AED/year","contact_options":{"list":{"phone":{"type":"phone","value":"+XXXXXXXX","link":"tel:+XXXXXXXX","is_did":true},"email":{"type":"email","value":"name@email.com","link":"mailto:name@email.com"}},"details":{"phone":{"type":"phone","value":"+XXXXXXXX","link":"tel:+XXXXXXXX","is_did":true},"sms":{"type":"sms","value":"+XXXXXXXX","link":"sms:+XXXXXXXX"},"email":{"type":"email","value":"name@email.com","link":"mailto:name@email.com"}}},"images_count":11}}'

and i want to extract with regex all the phrases that start with phone and end with }

i tried that re.findall(r"^phone(.*)}$",source)

that its what i want "phone","value":"+XXXXXXXX","link":"tel:+XXXXXXXX","is_did":true}

isstiaung
  • 611
  • 5
  • 12
emanuel lemos
  • 59
  • 2
  • 7
  • 1
    How does your data look like? Is it a string or a dictionary? – Nakor Jun 22 '19 at 06:58
  • 2
    This looks like json are you certain regex is the right way to go? Why not just load the json and get the values you want? – isstiaung Jun 22 '19 at 06:59
  • @Nakor it's a string.. i want to find with regex – emanuel lemos Jun 22 '19 at 07:03
  • @isstiaung yes it is... because i do not know how make it ... if you can tell me its gonna be amazing. the website is https://www.propertyfinder.ae/en/rent/apartment-for-rent-dubai-dubai-marina-marina-gate-marina-gate-1-6951117.html and the phone is in the – emanuel lemos Jun 22 '19 at 07:04
  • Have a look at [this link](https://stackoverflow.com/questions/4528099/convert-string-to-json-using-python) and try it out, should solve your problem – isstiaung Jun 22 '19 at 07:05

2 Answers2

1

Might be better to use json for this rather than regex. Try this out,

import json
test_str = '{"meta":{"subtitle":"Apartment for Rent in Marina Gate 1, Marina Gate","price":145000,"price_text":"145,000 AED/year","contact_options":{"list":{"phone":{"type":"phone","value":"+XXXXXXXX","link":"tel:+XXXXXXXX","is_did":true},"email":{"type":"email","value":"name@email.com","link":"mailto:name@email.com"}},"details":{"phone":{"type":"phone","value":"+XXXXXXXX","link":"tel:+XXXXXXXX","is_did":true},"sms":{"type":"sms","value":"+XXXXXXXX","link":"sms:+XXXXXXXX"},"email":{"type":"email","value":"name@email.com","link":"mailto:name@email.com"}}},"images_count":11}}'
print test_str

json_str = json.loads(test_str)
print json_str

phone_num = json_str['meta']['contact_options']['list']['phone']

print phone_num
isstiaung
  • 611
  • 5
  • 12
0

You can try this code (parsing <script> tag with re):

import requests
import json
import re

html_text = requests.get('https://www.propertyfinder.ae/en/rent/apartment-for-rent-dubai-dubai-marina-marina-gate-marina-gate-1-6951117.html').text
data = json.loads(re.findall(r'payload\s*:\s*(.*?)\n', html_text)[0])

for d in data['data']:
    print(d['meta']['subtitle'])
    print(d['meta']['contact_options'])
    print('*' * 80)

Prints:

Apartment for Rent in Marina Gate 1, Marina Gate
{'list': {'phone': {'type': 'phone', 'value': '+971528347286', 'link': 'tel:+971528347286', 'is_did': True}, 'email': {'type': 'email', 'value': 'ahmad@providentestate.com', 'link': 'mailto:ahmad@providentestate.com'}}, 'details': {'phone': {'type': 'phone', 'value': '+971528347286', 'link': 'tel:+971528347286', 'is_did': True}, 'sms': {'type': 'sms', 'value': '+971581806000', 'link': 'sms:+971581806000'}, 'email': {'type': 'email', 'value': 'ahmad@providentestate.com', 'link': 'mailto:ahmad@providentestate.com'}}}
********************************************************************************
Apartment for Rent in Marina Gate 1, Marina Gate
{'list': {'phone': {'type': 'phone', 'value': '+971525226138', 'link': 'tel:+971525226138', 'is_did': True}, 'email': {'type': 'email', 'value': 'suhail.p@w2realestate.com', 'link': 'mailto:suhail.p@w2realestate.com'}}, 'details': {'phone': {'type': 'phone', 'value': '+971525226138', 'link': 'tel:+971525226138', 'is_did': True}, 'sms': {'type': 'sms', 'value': '+971503940533', 'link': 'sms:+971503940533'}, 'email': {'type': 'email', 'value': 'suhail.p@w2realestate.com', 'link': 'mailto:suhail.p@w2realestate.com'}}}
********************************************************************************
Apartment for Rent in Marina Gate 1, Marina Gate
{'list': {'phone': {'type': 'phone', 'value': '+971528347286', 'link': 'tel:+971528347286', 'is_did': True}, 'email': {'type': 'email', 'value': 'ahmad@providentestate.com', 'link': 'mailto:ahmad@providentestate.com'}}, 'details': {'phone': {'type': 'phone', 'value': '+971528347286', 'link': 'tel:+971528347286', 'is_did': True}, 'sms': {'type': 'sms', 'value': '+971581806000', 'link': 'sms:+971581806000'}, 'email': {'type': 'email', 'value': 'ahmad@providentestate.com', 'link': 'mailto:ahmad@providentestate.com'}}}
********************************************************************************
Apartment for Rent in Marina Gate 1, Marina Gate
{'list': {'phone': {'type': 'phone', 'value': '+971522233791', 'link': 'tel:+971522233791', 'is_did': True}, 'email': {'type': 'email', 'value': 'eddy@exclusive-links.com', 'link': 'mailto:eddy@exclusive-links.com'}}, 'details': {'phone': {'type': 'phone', 'value': '+971522233791', 'link': 'tel:+971522233791', 'is_did': True}, 'sms': {'type': 'sms', 'value': '+971523279984', 'link': 'sms:+971523279984'}, 'email': {'type': 'email', 'value': 'eddy@exclusive-links.com', 'link': 'mailto:eddy@exclusive-links.com'}}}
********************************************************************************
Apartment for Rent in Marina Gate 1, Marina Gate
{'list': {'phone': {'type': 'phone', 'value': '+971565775168', 'link': 'tel:+971565775168', 'is_did': False}, 'email': {'type': 'email', 'value': 'julia@abodeproperty.ae', 'link': 'mailto:julia@abodeproperty.ae'}}, 'details': {'phone': {'type': 'phone', 'value': '+971565775168', 'link': 'tel:+971565775168', 'is_did': False}, 'sms': {'type': 'sms', 'value': '+971565775168', 'link': 'sms:+971565775168'}, 'email': {'type': 'email', 'value': 'julia@abodeproperty.ae', 'link': 'mailto:julia@abodeproperty.ae'}}}
********************************************************************************

Note: sometimes the website returns malformed HTML code, so you need to run the script several times until it succeeds (maybe you need to tune the regex - I haven't investigated further)

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91