2

I have a CSV file with over 400k rows and about 20 columns so it takes too much time for me to search the CSV when I'm looking for a certain value. I was wondering how I could map the file so that I can search it efficiently and quickly for a certain value.

Thank you in advance! I'm happy to answer any questions!

Soroush Ghodsi
  • 380
  • 1
  • 7
  • 19

3 Answers3

3

The pandas.read_csv function makes it fairly easy. Python does have csv reading/writing functions too.

Update: Here is a link to a post from the pandas creator on fast searching in pandas.

Community
  • 1
  • 1
Mark
  • 309
  • 1
  • 9
2

If I was doing something like this, I would load the cvs file into an SQLite database file and then use the python bindings for SQLite to search the data. SQLite is free and widely used, so you can easily find support, code snippets, etc. This approach does require some basic database concepts, mostly simple SQL statements.

mhopeng
  • 1,063
  • 1
  • 7
  • 17
  • How would I load the CSV file into a SQLite file? – Soroush Ghodsi Mar 02 '16 at 23:00
  • A tutorial on SQLite is out of scope for an SO question... But such tutorials are easy to find. Install SQLite and try it out, you'll have your solution going in a couple hours. – mhopeng Mar 03 '16 at 02:35
2

Assuming you can store the whole thing in RAM (i.e load it) I would use pandas.

import pandas as pd
csv_loc = 'C:/yourcsv.csv'
df = pd.read_csv(csv_loc, sep=',', header=0) #http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
df['name'] # would select a column called name
# This would show observations which start with STARBUC
match = df['name'].str.contains('(^STARBUC)')
print(df['name'][match].value_counts())

Etc. This is a great tutorial and this

mptevsion
  • 937
  • 8
  • 28