0

This Question is covering many topics and scenarios of Multiprocessing.

Stackoverflow

I've searched StackOverflow and although I've found many questions on this, I haven't found an answer that fits for my situation/not a strong python programmer to adapt their answer to fit my need. I am working on multiprocessing more than 2 weeks.

I have looked here youtube, stackoverflow, google, github to no avail:

https://www.youtube.com/watch?v=fKl2JW_qrso

https://www.youtube.com/watch?v=35yYObtZ95o

https://www.youtube.com/watch?v=IT8RYokUvvQ

kill a function after a certain time in windows

Creating a timeout function in Python with multiprocessing Handle multiprocessing.TimeoutError in multiprocessing pool.map_async()

many others.

Target Point

I am generating sitemap of different websites, websites urls are present in excel sheet and 'data' (present in below code) is the column name of websites. Some websites are taking so much time to crawl, I want if a website is taking more than 3 minutes to crawl, it stop the process, store the data that crawl in 3 minutes and start the new process.

I target the (row),but don' know how to access row in if __name__ == '__main__':

Program

from multiprocessing import Process
import time
from pysitemap import crawler
import pandas

#describe function with parameter crawler
def do_actions(crawler):
    #pandas to read excel file in excel file urls of different website, absolute path is path of excel sheet
    df = pandas.read_excel(r'Absolute path')
    for index, row in df.iterrows():
        #data is name of column of excel sheet(urls)
        Url=row['data']
        try:
            #crawler used to crawl urls that are in excel sheet
            crawler(Url, out_file=f'{index}sitemap.xml')
        except Exception as e:
            print (e)
            pass
if __name__ == '__main__':
    #create a Process
    action_process = Process(target=row)
    # start a process
    action_process.start()
    #timeout 180 seconds
    action_process.join(timeout=180)
    #process alive after 180s terminate
    if action_process is alive():
        action_process.terminate()

1 Answers1

0

You want to pass a callable as the target function, and not the return value of the function itself. Also, the specific attribute you want to check for is process.is_alive(). Lastly, while it doesn't affect your program, you should instantiate action_process under the if __name__ ... block. Think of it this way, do your child processes require access to this variable? If not, throw it under that block.

if __name__ == '__main__':
    action_process = Process(target=do_actions, args=(crawler, ))
    # start a process
    action_process.start()
    #timeout 180 seconds
    action_process.join(timeout=180)
    #process alive after 180s terminate
    if action_process.is_alive():
        action_process.terminate()
Charchit Agarwal
  • 2,829
  • 2
  • 8
  • 20