1

after scraping I get a list of dictionaries which I convert to dataframe then do some calculations on it. The dictionaries found are of correct order but when I convert them to df they lose the order and become alphabetical. here is my code:

import scrapy
from openpyxl import Workbook
import matplotlib.pyplot as plt
import pandas as pd
from PIL import Image
import numpy as np
#spider here
process = CrawlerProcess()
process.crawl(PythonEventsSpider)
spider = next(iter(process.crawlers)).spider
process.start()
# the order of dictionary is ['Name', 'Price', 'Link', 'Rating', 'Number of views',  'Price 1',
    #'Price 2', 'Price 3', 'Price 4', 'Fee', 'Price After Fee',
     #'Profit', 'Profit Percentage', 'Product ,'Notes']

df = pd.DataFrame(spider.found_items)
df.to_excel("{}.xlsx".format(file_name),sheet_name='All Products') # gets written in alphabetical order
#rest of code...

I checked this question How to sort a data frame by multiple column(s),but it is python 2 and some of these order techniques are depreciated

1 Answers1

0

Assign a list of keys that you can then select the columns from the dataframe in the desired order with:

keys = ['Name', 'Price', 'Link', 'Rating', 'Number of views', 'Price 1', 'Price 2', 'Price 3', 'Price 4', 'Fee', 'Price After Fee', 'Profit', 'Profit Percentage', 'Product' , 'Notes']

df = pd.DataFrame(spider.found_items)[keys]
df.to_excel("{}.xlsx".format(file_name), sheet_name='All Products')
cwalvoort
  • 1,851
  • 1
  • 18
  • 19
  • like this `df.sort(csv_columns,ascending=[True for i in range(len(csv_columns))],inplace=True)` is better looking, I will test now –  May 14 '19 at 20:15
  • I updated my answer, I originally read your question incorrectly. I thought you wanted to sort the data, but's it's the columns that are in the wrong order. Sorry about that! – cwalvoort May 14 '19 at 20:19
  • I know I updated my answer, but to comment on your comment, `[True]*len(df.keys())` is even cleaner. – cwalvoort May 14 '19 at 20:22