In a postgresql database:
class Persons(models.Model):
person_name = models.CharField(max_length=10, unique=True)
The persons.csv
file, contains 1 million names.
$cat persons.csv
Name-1
Name-2
...
Name-1000000
I want to:
- Create the names that do not already exist
- Query the database and fetch the id for each name contained in the csv file.
My approach:
Use the
COPY
command or the django-postgres-copy application that implements it.Also take advantage of the new Postgresql-9.5+ upsert feature.
Now, all the names in the csv file, are also in the database.
I need to get their ids -from the database- either in memory or in another csv file with an efficient way:
Use
Q
objectslist_of_million_q = <iterate csv and append Qs> million_names = Names.objects.filter(list_of_million_q)
or
Use
__in
to filter based on a list of names:list_of_million_names = <iterate csv and append strings> million_names = Names.objects.filter( person_name__in=[list_of_million_names] )
or
?
I do not feel that any of the above approaches for fetching the ids is efficient.
Update
There is a third option, along the lines of this post that should be a great solution which combines all the above.