This Question is covering many topics and scenarios of Multiprocessing.
Stackoverflow
I've searched StackOverflow and although I've found many questions on this, I haven't found an answer that fits for my situation/not a strong python programmer to adapt their answer to fit my need. I am working on multiprocessing more than 2 weeks.
I have looked here youtube, stackoverflow, google, github to no avail:
https://www.youtube.com/watch?v=fKl2JW_qrso
https://www.youtube.com/watch?v=35yYObtZ95o
https://www.youtube.com/watch?v=IT8RYokUvvQ
kill a function after a certain time in windows
Creating a timeout function in Python with multiprocessing Handle multiprocessing.TimeoutError in multiprocessing pool.map_async()
many others.
Target Point
I am generating sitemap of different websites, websites urls are present in excel sheet and 'data' (present in below code) is the column name of websites. Some websites are taking so much time to crawl, I want if a website is taking more than 3 minutes to crawl, it stop the process, store the data that crawl in 3 minutes and start the new process.
I target the (row)
,but don' know how to access row
in if __name__ == '__main__':
Program
from multiprocessing import Process
import time
from pysitemap import crawler
import pandas
#describe function with parameter crawler
def do_actions(crawler):
#pandas to read excel file in excel file urls of different website, absolute path is path of excel sheet
df = pandas.read_excel(r'Absolute path')
for index, row in df.iterrows():
#data is name of column of excel sheet(urls)
Url=row['data']
try:
#crawler used to crawl urls that are in excel sheet
crawler(Url, out_file=f'{index}sitemap.xml')
except Exception as e:
print (e)
pass
if __name__ == '__main__':
#create a Process
action_process = Process(target=row)
# start a process
action_process.start()
#timeout 180 seconds
action_process.join(timeout=180)
#process alive after 180s terminate
if action_process is alive():
action_process.terminate()