1

I'm trying to run a script from the django shell to bulkcreate a database from a csv. I'm not sure if my pandas is wrong or my django model is to blame. I'm using Python3 and I'm not sure if that affects things either. I'm getting pretty lost in the django docs

I want to import this csv from kaggle: https://www.kaggle.com/weil41/flights/data

script:

    import pandas as pd
from .models import Flight

data = pd.read_csv('data/Flights.csv', sep=',')

# year,month,day,dep_time,dep_delay,arr_time,arr_delay,cancelled,
# carrier,tailnum,flight,origin,dest,air_time,distance,hour,min
flights = [
    Flight(
        year = data.ix[row]['year'],
        month = data.ix[row]['month'],
        day = data.ix[row]['day'],
        dep_time = data.ix[row]['dep_time'],
        dep_delay = data.ix[row]['dep_delay'],
        arr_time = data.ix[row]['arr_time'],
        arr_delay = data.ix[row]['arr_delay'],
        cancelled = data.ix[row]['cancelled'],
        carrier = data.ix[row]['carrier'],
        tailnum = data.ix[row]['tailnum'],
        flight = data.ix[row]['flight'],
        origin = data.ix[row]['origin'],
        dest = data.ix[row]['dest'],
        air_time = data.ix[row]['air_time'],
        distance = data.ix[row]['distance'],
        hour = data.ix[row]['hour'],
        min = data.ix[row]['min'],
    )
    for row in data
]
Flight.objects.bulk_create(flights)

models.py

from django.db import models

# year,month,day,dep_time,dep_delay,arr_time,arr_delay,cancelled,
# carrier,tailnum,flight,origin,dest,air_time,distance,hour,min

class Flight(models.Model):
    year = models.CharField(max_length=100, default='')
    month = models.CharField(max_length=100, default='')
    day = models.CharField(max_length=100, default='')
    dep_time = models.CharField(max_length=100, default='')
    arr_time = models.CharField(max_length=100, default='')
    arr_delay = models.CharField(max_length=100, default='')
    cancelled = models.CharField(max_length=100, default='')
    carrier = models.CharField(max_length=100, default='')
    tailnum = models.CharField(max_length=100, default='')
    flight = models.CharField(max_length=100, default='')
    origin = models.CharField(max_length=100, default='')
    dest = models.CharField(max_length=100, default='')
    air_time = models.CharField(max_length=100, default='')
    distance = models.CharField(max_length=100, default='')
    hour = models.CharField(max_length=100, default='')
    min = models.CharField(max_length=100, default='')

    def __str__(self):
        return f'{self.flight} {self.dest} {self.year} {self.month} {self.day}'

The error I get is KeyError: "'name' not in globals"?

Error message:

exec(open('calendarapp/get_data.py').read()) Traceback (most recent call last): File "", line 1, in File "", line 2, in KeyError: "'name' not in globals"

Davtho1983
  • 3,827
  • 8
  • 54
  • 105

1 Answers1

2

See this question for a similar case.

Based on the solution there, you could try to change the import statement from

from .models import Flight

to

from [app_name].models import Flight

In your case it seems this would result in:

from calendarapp.models import Flight

EDIT: I suggest changing you iteration procedure.

flights = [
    Flight(
        year = row['year'],
        ...
    )
    for i, row in df.iterrows()]
Flight.objects.bulk_create(flights)

Note how I used pandas iterrows, which makes the code a bit more readable.

You could read through this post for some context of how to use .ix (or why not to use it).

Also, bulk_create does not handle the creation of ID fields yet (if its not Postgres).

Ralf
  • 16,086
  • 4
  • 44
  • 68
  • 1
    I don't know why this is happening, but it seems to have solved the problem in the past. – Ralf May 03 '18 at 14:27
  • That does solve something it seems! Now I'm getting 'KeyError: 'year' ? – Davtho1983 May 03 '18 at 14:33
  • @Davtho1983 see my updated anser, maybe that is the cause of the `KeyError`. – Ralf May 03 '18 at 14:46
  • You're right about the typo, I think I'm using a version of pandas where ix is deprecated anyway so that might affect it. I changed syntax to year = data.iloc[row, 0], which gives a ValueError so it's not iterating over it by variable row? – Davtho1983 May 03 '18 at 15:02
  • @Davtho1983 that depends: does your variable `data` have a index with names for the columns or just position? You can find out with `print(data.columns)`. – Ralf May 03 '18 at 15:09
  • Ah no! I get TypeError: 'Index' object is not callable - but when I print(data.head()) it seems to recognise the column names as names of columns? Do I need to specify column names in the read_csv call? – Davtho1983 May 03 '18 at 15:12
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/170302/discussion-between-ralf-and-davtho1983). – Ralf May 03 '18 at 15:17