0

I have a post request that takes a CSV file and saves all valid data in this file to DB models. But it is incredibly slow, cause CSV files can be huge. Is there any better way to do it?

def post(self, request, *args, **kwargs):
    serializer = self.get_serializer(data=request.data)
    serializer.is_valid(raise_exception=True)
    file = serializer.validated_data['file']
    decoded_file = file.read().decode()
    io_string = io.StringIO(decoded_file)
    reader = csv.reader(io_string)

    for row in reader:
      if check_deal_validity(row):
        try:
          Client.objects.get(username=row[0])
        except ObjectDoesNotExist:
          client = Client(username=row[0])
          client.save()

        try:
          Item.objects.get(name=row[1])
        except ObjectDoesNotExist:
          item = Item(name=row[1])
          item.save()

        deal = Deal(
          client=Client.objects.get(username=row[0]),
          item=Item.objects.get(name=row[1]),
          total=row[2],
          quantity=row[3],
          date=row[4],
        )
        deal.save()
        Client.objects.filter(username=deal.client).update(spent_money=F('spent_money') + deal.total)
        if check_item_existence_for_client(
          client=deal.client,
          item=deal.item
        ):
          pass
        else:
          deal.client.gems.add(deal.item)

    return Response(status=status.HTTP_204_NO_CONTENT)
  • this sort of question is much better suited to https://codereview.stackexchange.com/ – Sam Mason Jan 25 '20 at 21:34
  • Maybe io.StringIO is the problem. look [at this question](https://stackoverflow.com/questions/25580925/why-is-stringio-object-slower-than-real-file-object) about file object and give it a try. – Ohad Jan 25 '20 at 21:34

1 Answers1

0

If this is a big file, and you're doing a big insert, this is something that should be done outside of the request-response cycle that Django uses.

I would suggest integrating an async task queue with your project for this, so look at Celery, redis-queue, and huey.

In addition, you're doing this in a very unoptimized way, and could use bulk_create inside of the django ORM for reducing the number of individual inserts in favor of a bulk operation.

Jason
  • 11,263
  • 21
  • 87
  • 181