1

I try to extract filenames from a topic of supergroup. I try in this way

from pyrogram import Client

app = Client(
    name="@Peter_LongX",
    api_id=27*******,
    api_hash="b5*******************",
    phone_number="+393*******",
    password="********" or None
)

group_id = -1001867911973
topic_id = 692
msg_file_dict = {}

with app:
    for message in app.get_chat_history(group_id, limit=5):
        print(f"Message Link: {message.link}")
        print(f"Message ID: {message.id}")
        
        if message.link and str(topic_id) in message.link:
            print("Topic ID found in message link")
            
            if message.video or message.document.mime_type.startswith("video"):
                print("Video or video document found")
                
                msg_id = message.id
                file_name = message.video.file_name or f"VID_{message.id}_{message.video.file_unique_id}.{message.video.mime_type.split('/')[-1]}"
                msg_file_dict[msg_id] = file_name

print(msg_file_dict.keys()) # List of Message ID
print(msg_file_dict.values()) # List of File Name

ERRORS

  1. It show me the last message links / message ids of entire supergroup and not the last message links / ids of that specific topic. In my context I set up to extract filenames and message links/ids from topic that is 692 - this because from web telegram I see this url : https://web.telegram.org/a/#-1867911973_692 - however it doesn't print the names of the documents, even if they belong to the supergroup

  2. I have a topic with documents, rar files or videos but it doesn't print the filenames of messages links that have documents

Any idea to solve ?

From console I see something like that

PS C:\Users\Peter\Desktop\script\messagge_id_telegram> python getmsg.py
Message Link: https://t.me/lasoff...../223390
Message ID: 223390
Message Link: https://t.me/lasoff...../223389
Message ID: 223389

but I expect something like this

PS C:\Users\Peter\Desktop\script\messagge_id_telegram> python getmsg.py
enter code here
Message Link: https://t.me/lasoff...../223390
Message ID: 223390

Filename: Fondazione_1x10.mp4
Message Link: https://t.me/lasoff...../223389
Message ID: 223389

For example Message Link: https://t.me/lasoff...../223390 should be it might not have a name because maybe it's a sticker or an emojii

Additional question: Do you have any idea how you could print the results from a specific range of dates, for example from July 1st to August 1st 2023?

EDIT: as quamrana suggested I tried to get the filenames printed, but still nothing changes, they are not returned when these are there. To do this I change this part of code

    if message.link and str(topic_id) in message.link:
        print("Topic ID found in message link")

with this

if message.link and str(topic_id) in message.link and (message.video or message.document.mime_type.startswith("video")):
    print("Topic ID found in message link")
    print("Video or video document found")
    msg_id = message.id
    file_name = message.video.file_name or f"VID_{message.id}_{message.video.file_unique_id}.{message.video.mime_type.split('/')[-1]}"
    msg_file_dict[msg_id] = file_name
Peter Long
  • 21
  • 3
  • In your actual output from the console, why don't you see any lines like: `"Topic ID found in message link"`? Also in the output you expect, why do you expect: `Filename: ...` when there is no `print()` which could output that? – quamrana Aug 28 '23 at 17:01
  • @quamrana I have now updated the question, the problem is the filenames are not returned anyway (when they exist, for example if they are videos in mp4 or mkv) – Peter Long Aug 28 '23 at 17:38
  • You're doing `limit=5`. If there were no messages on topic 692 in the last 5 group messages for the group, then you won't see anything. – Tim Roberts Aug 28 '23 at 17:57
  • @TimRoberts No, I also extended it to 500 but it doesn't change, because it only returns the message ids and in any case those of the supergroup, not those of the specific topic I requested. Look at this picture to make yourself understood : https://imgur.com/VmNlx7h.png - For example, the code should print the name of the video "Fondazione _1x10.mp4" but it doesn't even do it for the supergroup, and one of the mistakes that is made is that of not even considering the topic – Peter Long Aug 28 '23 at 20:50

1 Answers1

0

The message links returned by get_chat_history() do not actually contain the topic ID - they follow a format like https://t.me/lasoff...../223390. ↗ Telegram constructs these links based on the chat ID and message ID only.

Solution to error 1:

To get messages for a specific topic, we need to use get_discussion_replies(). This method allows passing the ID of the topic starter message, and will return only messages in that topic thread.

Solution to error 2:

you have a 'if condition' in your code that never satisfies.

if message.link and str(topic_id) in message.link:

the str(topic_id) in message.link part is never True, because your links don't contain topic_id in the link like you see on the web version url.

the corrected code will be like this:

group_id = -1001867911973
topic_id = 692
msg_file_dict = {}

async def main():
    async with app:
        async for message in app.get_discussion_replies(chat_id=group_id,  message_id=topic_id):
            print(f"Message ID: {message.id}")

            if message.video or (message.document and message.document.mime_type.startswith("video")):
                file = message.video or message.document
                print("Video or video document found")

                msg_id = message.id
                file_name = file.file_name or f"VID_{message.id}_{file.file_unique_id}.{file.mime_type.split('/')[-1]}"
                print(file_name)
                msg_file_dict[msg_id] = file_name
            print()

app.run(main())
print(msg_file_dict.keys())  # List of Message ID
print(msg_file_dict.values())  # List of File Name
beh-f
  • 103
  • 1
  • 9
  • mm.. Is it possible to print the original filename of the video or document as well? I see only this in console `Message ID: 223340`, `Message ID: 223339 Video or video document found`, `Message ID: 223338 Video or video document found` but no original file name is printed after "Video or video document found". For example this `Message ID: 223339` is this link `https://t.me/lasoffittadownloads/661/223389` and the original file name is `Ergo Proxy - 1x23 - Emissario - 1080p by stress.mkv` but is not printed – Peter Long Aug 28 '23 at 23:13
  • @PeterLong I edited the answer. so it will print filename if it founds any. – beh-f Aug 29 '23 at 10:10