0

So I have a txt file that contains several links along with other text, More specifically a list of twitter like data, (tweets that I have liked), And im trying to compile the image links specifically (t.co links) into a single txt file. So I made this script.


FileObject = open(r"like.txt","r")
word = str(FileObject)
link=[]
result = re.search('https://t.co', word)
while True:
        try:
            result_string = result.group(0)
            link.append(result_string)
            word= word.replace(result_string, "")
            result = re.search('https://t.co', word)
            FileObject2 = open(r"list.txt","r+")
            if link(None):
                print("No Image URLS Found")
            else:
                FileObject2.write(link + "\n")
                FileObject2.close("list.txt")
                result = re.search('https://t.co', word)
        except: break

However upon running this, Nothing is added to list.txt. Please help.

Heres a couple of lines from the text file.

  "like" :  
      "tweetId" : "1594749508147191808" 
      "fullText" : "@tragicbirdapp https://t(dot)co/LTEe5qrv0B" 
      "expandedUrl" : "https://twitter.com/i/web/status/1594749508147191808"
     
 
    "like" :  
      "tweetId" : "1594880996431781890" 
      "fullText" : "New Drawing https://t(dot)co/kLziQSpbrT" 
      "expandedUrl" : "https://twitter.com/i/web/status/1594880996431781890"
     ```
KniteRite
  • 11
  • 4
  • Did you try `print(word)`? It's not the contents of the file. Use `fileObject.read()` – Barmar Nov 22 '22 at 19:38
  • `re.search()` will only return the first match. Use `re.findall()` to get all the matches. – Barmar Nov 22 '22 at 19:39
  • Your use of `re.search()` won't return the whole URL, it will just return the part that matches the regexp. You need a pattern that matches the rest of the URL. – Barmar Nov 22 '22 at 19:40
  • `word = str(FileObject)` Calling str() on a file object does _not_ give you the contents of the file... – John Gordon Nov 22 '22 at 19:41
  • Your "like.txt" file object is still open when you open a second one. Don't do that. – Tim Roberts Nov 22 '22 at 19:46
  • @Barmar unfortunately there IS no pattern for these links, other than all of them end in and start with " – KniteRite Nov 22 '22 at 19:49
  • We can't read your mind, and your code is mostly nonsense. Please show us what your file looks like, and what you are expecting the output to be. – Tim Roberts Nov 22 '22 at 19:49
  • https://stackoverflow.com/questions/3809401/what-is-a-good-regular-expression-to-match-a-url – Barmar Nov 22 '22 at 19:51
  • @TimRoberts I added an example of the TXT file im pulling from. I explained the output already. – KniteRite Nov 22 '22 at 19:57
  • That's not a text file, that's a JSON. Don't parse this with regular expressions. Use `data = json.load(open("list.txt"))`, then go through the list items with a `for` loop and fetch `row["fullText"]`. – Tim Roberts Nov 22 '22 at 20:53

1 Answers1

-1

try this for the file reading: https://www.tutorialkart.com/python/python-read-file-as-string/

#open text file in read mode
text_file = open("D:/data.txt", "r")

#read whole file to a string
word = text_file.read()
  • You will also need to find a pattern that matches the links, but without an example of the text file it is difficult to know what that pattern may be. – user3273429 Nov 22 '22 at 19:48