0

I am learning how to use python.

For the project I am working on, I have hundreds of datasheets containing a City, Species, and Time (speciesname.csv).
I also have a single datasheet that has all cities in the world with their latitude and longitude point (cities.csv).

My goal is to have 2 more columns for latitude and longitude (from cities.csv) in every (speciesname.csv) datasheet, corresponding to the location of each species. I am guessing my workflow will look something like this: Go into speciesname.csv file and find the location on each line Go into cities.csv and search for the location from speciesname.csv Copy the corresponding latitude and longitude into new columns in speciesname.csv.

I have been unsuccessful in my search for a blog post or someone else with a similar question. I don't know where to start so anyone with a starting point would be very helpful.

Thank you.

Taliamycota
  • 125
  • 7
  • Does this answer your question? [How do I read and write CSV files with Python?](https://stackoverflow.com/questions/41585078/how-do-i-read-and-write-csv-files-with-python) – mkrieger1 Oct 23 '20 at 22:04
  • SQL is a better tool for this sort of thing. However, Python has pandas library which does OK for small number of csv or if file size is not too big and I've posted an answer using pandas. – user2415706 Oct 23 '20 at 22:43

2 Answers2

0

You can achieve it in many ways. The simplest way I can think of to approach this problem is:

  1. collect all cities.csv data inside a dictionary {"cityname":(lat,lon), ...}
  2. read line by line your speciesname.csv and for each line search by key (key == speciesname_cityname) in the dictionary.
  3. when you find a correspondence add all data from the line and the lat & lon separated by comma to a buffer string that has to end with a "\n" char
  4. when the foreach line is ended your buffer string will contains all the data and can be used as input to the write to file function
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251
Fantantonio
  • 377
  • 4
  • 15
0

Here is a little program that should work if you put it in the same folder as your separate CSVs. I'm assuming you just have 2 sheets, one that is cities and another with the species. Your description saying the cities info is in hundreds of datasheets is confusing since then you say it's all in one csv.

This program turns the two separate CSV files into pandas dataframe format which can then be joined on the common city column. Then it creates a new CSV from the joined data frame.

In order for this program to work, you need to need to install pandas which is a library specifically for dealing with things in tabular (spreadsheet) format. I don't know what system you are on so you'll have to find your own instructions from here: https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html

This is the version if your csv do not have a header, which is when the first row is just some data.

# necessary for the functions like pd.read_csv
import pandas as pd 

species_column_names = ['city','species','time']
speciesname = pd.read_csv('speciesname.csv', names=species_column_names, header=None)

cities_column_names = ['city','lat','long']
cities = pd.read_csv('cities.csv', names=cities_column_names, header=None)
# this joining function relies on both tables having a 'city' column 
combined = speciesname.join(cities.set_index('city'), on='city')

combined_csv = combined.to_csv()

If you already have headers for both files, use these two lines instead to ignore the first row since I don't know how they are spelled/capitalized/whatever and we are joining based on all lower case custom column names:

import pandas as pd 

species_column_names = ['city','species','time']
speciesname = pd.read_csv('speciesname.csv', names=species_column_names, skiprows = 0, header=None)

cities_column_names = ['city','lat','long']
cities = pd.read_csv('cities.csv', names=cities_column_names, skiprows = 0, header=None)
# this joining function relies on both tables having a 'city' column 
combined = speciesname.join(cities.set_index('city'), on='city')

combined_csv = combined.to_csv()
user2415706
  • 932
  • 1
  • 7
  • 19
  • 1
    Great! Please upvote and hit accept on this answer so other people in the future see this float towards the top as answers get added to this page! – user2415706 Oct 24 '20 at 02:29