2

I am scraping a table from a website and want to create a pandas dataframe out of that. My question is what is the best method to achieve this in terms of efficiency / best practice?

What I have done is while scraping, append items to several lists to represent the columns. Once I'm done parsing the table from the website, I create the DataFrame and assign the lists to column names. See below:

zip_df = pd.DataFrame(index=zip_codes)
zip_df['Latitude'] = latitudes
zip_df['Longitude'] = longitudes

There seems to be many different ways to approach this (e.g. Python pandas: fill a dataframe row by row). Is the way I am doing it most logical? Or are there better approaches?

Community
  • 1
  • 1
ricefarmer369
  • 63
  • 2
  • 8
  • 1
    Adding your values to lists is fastest but how you add to lists is where you can add speed. It really depends how you are reading your information from the website but instead of iterating through each tag on the site, you should find all the tags/paths/ids at the same time and save it to a list in one go. Finally, for cleaner code you can create your dataframe like this instead: zip_df = pd.DataFrame({'Latitude':latitudes, 'Longitude':longitudes}) – A.Kot Oct 13 '16 at 22:11

1 Answers1

0
zip_df = pd.DataFrame(zip(lats,longs), index=zip_codes, columns=['Latitude', 'Longitude'])
griko
  • 126
  • 4