I'm wondering if there's a way to combine the final result into a single file when using Spark? Here's the code I have:
conf = SparkConf().setAppName("logs").setMaster("local[*]")
sc = SparkContext(conf = conf)
logs_1 = sc.textFile('logs/logs_1.tsv')
logs_2 = sc.textFile('logs/logs_2.tsv')
url_1 = logs_1.map(lambda line: line.split("\t")[2])
url_2 = logs_2.map(lambda line: line.split("\t")[2])
all_urls = uls_1.intersection(urls_2)
all_urls = all_urls.filter(lambda url: url != "localhost")
all_urls.collect()
all_urls.saveAsTextFile('logs.csv')
The collect() method doesn't seem to be working (or I've misunderstood its purpose). Essentially, I need the 'saveAsTextFile' to output to a single file, instead of a folder with parts.