I have a JSON file that contains metadata for 900 articles. I want to delete all the data except for the lines that contain URLs and resave the file as .txt
.
I created this code but I couldn't continue the saving phase:
import re
with open("path\url_example.json") as file:
for line in file:
urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', line)
print(urls)
A part of the results:
['http://www.google.com.']
['https://www.tutorialspoint.com']
Another issue is the results are marked between [' ']
and may end with .
I don't need this. My expected result is:
http://www.google.com
https://www.tutorialspoint.com