0

I am given a csv file that looks something like this

ID, name, age, city
1, Andy, 25, Ann Arbor
2, Bella, 40, Los Angeles
3, Cathy, 13, Eureka
...
...

If I want to get the city of ID=3, which would be Eureka for this example. Is there a way to do this efficiently instead of iterating each row? My php code will be executing this python script each time to get the value, and I feel like being very inefficient to loop through the csv file every time.

Megool
  • 963
  • 2
  • 8
  • 29

3 Answers3

3

iterate over the file once and save the data into a dictionary:

data = {}
with open('input.csv') as fin:
    reader = csv.DictReader(fin)
    for record in reader:
        data[record['ID']] = {k:v for k,v in record.items() if k <> 'ID'}

then just access the required key in the dictionary:

print data[3]['city'] # Eureka

in case you want to persist the data in the key:value format you can save it as a json file:

import json
import csv

j = {}
with open('input.csv') as fin:
    reader = csv.DictReader(fin)
    for record in reader:
        j[record['ID']] = {k:v for k,v in record.items() if k <> 'ID'}
with open('output.json','w') as fout:
    json.dump(j,fout)
yurib
  • 8,043
  • 3
  • 30
  • 55
  • Would doing this way be certainly faster than iterating the csv file each time, assuming the csv file has quite bit of lines? – Megool May 31 '15 at 05:19
  • Why bother writing The data as JSON? Reading the file (which may be written by an external system) into memory as a dict should suffice. – johnsyweb May 31 '15 at 05:20
  • 1
    @YunCHan yes, a dictionary uses a hash to access values directly as opposed to iterating over all values in a list – yurib May 31 '15 at 05:25
0

In a word: no.

As yurib mentioned, one method is to convert your files to JSON and go from there, or just to dump to a dict. This gives you the ability to do things like pickle if you need to serialize your dataset, or shelve if you want to stash it someplace for later use.

Another option is to dump your CSV into a queryable database by way of using something like Python's built-in sqlite3 support. It depends on where you want your overhead to lie: pre-processing your data in this manner saves you from having to parse a large file every time your script runs.

Check out this answer for a quick rundown.

Community
  • 1
  • 1
0

If I want to get the city of ID=3, which would be Eureka for this example. Is there a way to do this efficiently instead of iterating each row? My php code will be executing this python script each time to get the value, and I feel like being very inefficient to loop through the csv file every time.

Your ideal solution is to wrap this Python code into an API that you can call from your PHP code.

On startup, the Python code would load the file into a data structure, and then wait for your request.

If the file is very big, your Python script would load it into a database and read from there.

You can then choose to return either a string, or a json object.

Here is a sample, using Flask:

import csv
from flask import Flask, request, abort

with open('somefile.txt') as f:
   reader = csv.DictReader(f, delimiter=',')
   rows = list(reader)
   keys = row[0].keys()

app = Flask(__name__)

@app.route('/<id>')
@app.route('/')
def get_item():
    if request.args.get('key') not in keys:
        abort(400) # this is an invalid request
    key = request.args.get('key')
    try:
        result = next(i for i in rows if i['id'] == id)
    except StopIteration:
        # ID passed doesn't exist
        abort(400)
    return result[key]

if __name__ == '__main__':
    app.run()

You would call it like this:

http://localhost:5000/3?key=city
Burhan Khalid
  • 169,990
  • 18
  • 245
  • 284