0

I have a scraper for yellow pages and after scraping from desired categories it saves all the data in a csv named parent.csv. It has a column keyword which has the category for the business listed. I want to separate different categories based on the keyword and generate different csv files for each. I have implemented the following in spider_closed function:

def spider_closed(self, spider):
    with open('parent.csv', 'r') as file:
        reader = csv.reader(file)
        headers = next(reader, None)
        next(reader, None)
        for row in reader:

            with open('{}.csv'.format(row[0]), 'a') as f:
                writer = csv.writer(f)
                

                writer.writerow(row)

With this i have been able to successfully separate the categories but the problem is with headers. I want the headers to be also written to each new file. Moreover the data in the new csv files have one space in each row. I need to solve both of these problems. Any help in this regard will be appreciated.

[![This is the parent.csv file that is generated by the spider successfully][1]][1]

[![then seperate the enteries based on the keyword and make a new csv file based pm that keyword. For example: all the data with go karts keywords must be in go karts.csv and so on][2]][2]

Parent.csv [1]: https://i.stack.imgur.com/Ucgym.png seperated.....go karts.csv [2]: https://i.stack.imgur.com/3NVKo.png

rex sphinx
  • 31
  • 6

1 Answers1

2

To add headers to the files, you could perform a quick test and check if the file does not already exist and instantiate it with the headers you have already logged:

import os.path

def spider_closed(self, spider):
    with open('parent.csv', 'r') as file:
        reader = csv.reader(file)
        headers = next(reader, None)
        for row in reader:
            # If the file does not already exist, create it with the headers
            if not os.path.exists('{}.csv'.format(row[0])):
                with open('{}.csv'.format(row[0]), 'w') as f:
                    writer = csv.writer(f)
                    writer.writerow(headers)

            with open('{}.csv'.format(row[0]), 'a') as f:
                writer = csv.writer(f)
                writer.writerow(row)

Regarding the extra lines, there is already a good answer to this question here: CSV file written with Python has blank lines between each row

Fred
  • 482
  • 3
  • 7
  • thank you....just one more query. The first element in the parent.csv is not being transferred to the respective csv file. So there is always one less item in one of the csv categories. Kindly guide me on how to do that. – rex sphinx May 22 '20 at 10:47
  • Just looked at your original code, after the line `headers = next(reader, None)` you have an extra line `next(reader, None)`, this is where you have lost your row, I've just removed this from my answer. Don't forget to mark this as the correct answer if your happy! – Fred May 22 '20 at 11:56
  • 1
    No worries! Glad I can help you out – Fred May 22 '20 at 12:25