i have to merge two text files together into one, and create a new list from that. The first one contains urls and the other one urlpaths/folder, which have to be applied to EVERY url. Im Working with lists, and its really slow, because its roughtly about 200,000 items.
Sample:
urls.txt:
http://wwww.google.com
....
paths.txt:
/abc
/bce
....
Later, after the loop is finished, there should be a new list with
http://wwww.google.com/abc
http://wwww.google.com/bce
Python Code:
URLS_TO_CHECK = [] #defined as global, needed later
def generate_list():
urls = open("urls.txt", "r").read().splitlines()
paths = open("paths.txt", "r").read().splitlines()
done = open("done.txt", "r").read().splitlines() #old done urls
for i in range(len(urls)):
for x in range(len(paths)):
url = re.search('(http://(.+?)....)', urls[i]) #needed
url = "%s%s" %(url.group(1), paths[x])
if url not in URLS_TO_CHECK:
if url not in done:
URLS_TO_CHECK.append(url) ##<<< slow!
Already read some other threads about map
function, disable gc
, but cant use map
function with my program. and disable gc
didn't really help.