2

I'm trying to save a really big .csv file from upload form in Django into MongoDB, it works, but it takes too much time to process, so I decided to use Multiprocessing, this approach shorted the time a little bit, but CPU cores came to 100%, so I would like to make my code a little bit lower-computation expensive. Here is my code

views.py



def save_to_db(line): # I think this part cost the most time
        column = line.split(",")
        kokyaku.objects.create(顧客CD = int(column[0]), 顧客補助CD = int(column[1]),
                                顧客名称s=str(column[2]), 顧客名称=str(column[3]),
                                顧客名称カナ=str(column[4]), 法人名称=str(column[5]),
                                代表者名称=str(column[6]),住所=str(column[7]),
                                電話番号=str(int(column[8])),地区名称=str(column[9]),
                                データマッチ用電話番号=int(column[10]),契約状態=str(column[11])
                                )




def upload(request):
    data = {}
    if "GET" == request.method:
        return render(request, "main/upload.html", data)
    # if not GET, then proceed

    csv_file = request.FILES["csv_file"]
    file_type = request.POST.get("type", "")
    if not csv_file.name.endswith('.csv'):
        messages.error(request,'File is not CSV type')
        return HttpResponseRedirect(reverse("upload"))

    file_data = csv_file.read().decode("utf-8")
    lines = file_data.split("\n")
    if file_type == "val3":
        with concurrent.futures.ProcessPoolExecutor() as executor:
            executor.map(save_to_db, lines)
    return HttpResponseRedirect(reverse("upload"))

btw. here is my class from models.py If it helps too


class kokyaku(models.Model):
    顧客CD = models.IntegerField(blank=True)
    顧客補助CD = models.IntegerField(blank=True)
    顧客名称s = models.TextField(blank=True)
    顧客名称 = models.TextField(blank=True)
    顧客名称カナ = models.TextField(blank=True)
    法人名称 = models.CharField(max_length=15, blank=True)
    代表者名称 = models.CharField(max_length=15, blank=True)
    住所 = models.TextField(blank=True)
    地区名称 = models.TextField(blank=True)
    電話番号 = models.IntegerField(blank=True)
    データマッチ用電話番号 = models.IntegerField(blank=True)
    契約状態 = models.CharField(max_length=2, blank=True)

    def __str__(self):
        string = str(self.顧客CD) + " - " + self.顧客名称
        return string

1 Answers1

1

You can try using bulk_create, but I'm not sure if this works with MongoDB.

def save_to_db(line):
    column = line.split(",")
    return kokyaku(col0 = column[0], col1 = column[1], ... )

def upload(request):

    ...

    lines = file_data.split("\n")
    kokyaku.objects.bulk_create(map(save_to_db, lines))

    return HttpResponseRedirect(reverse("upload"))

Update

Here is another solution using a try/catch block:

def upload(request): 

    ...

    lines = file_data.split("\n")
    items = []
    for line in lines:
        column = line.split(',')
        try:
            item = kokyaku(col0 = int(column[0]), ...)
            items.append(item)
        except ValueError:
            print(item)

    kokyaku.objects.bulk_create(items)
Lord Elrond
  • 13,430
  • 7
  • 40
  • 80
  • thanks for your reply, I tried it, `bulk` seems to work, but probably the data order or something went wrong. `File "/home/marcel/Downloads/blog/main/views.py", line 22, in save_to_db return kokyaku(顧客CD = int(column[0]), 顧客補助CD = int(column[1]), ValueError: invalid literal for int() with base 10: '' ` – Andrejovic Andrej Dec 25 '19 at 02:23
  • @AndrejovicAndrej are you calling `int` on a decimal or a string? See [this question](https://stackoverflow.com/a/8948303/10746224) for a reference to your error. – Lord Elrond Dec 25 '19 at 02:25
  • its like 50,000 rows, but it looks like there is not a decimal – Andrejovic Andrej Dec 25 '19 at 02:28
  • @AndrejovicAndrej I added an update. See if that helps – Lord Elrond Dec 25 '19 at 02:41
  • thanks for the update! I feel I'm near sollution. This time it starts the loop, but the second loop it throws: `line 463, in bulk_create objs = list(objs) TypeError: 'kokyaku' object is not iterable ` – Andrejovic Andrej Dec 25 '19 at 02:55
  • @AndrejovicAndrej it sounds like you are trying to pass a `kokyaku` object to `bulk_create`. Make sure you are passing a *list of kokyakus*. – Lord Elrond Dec 25 '19 at 03:05
  • 1
    oh god, I'm stupid :-D Thanks! It still loading but seems that it works! – Andrejovic Andrej Dec 25 '19 at 03:14
  • btw. It seems that it's not faster, I printed the times and this part `item = kokyaku(顧客CD = int(column[0]),....` takes too long. Is it because I'm calling `kokyaku` every time? – Andrejovic Andrej Dec 25 '19 at 03:26