Setting up a file upload stream scan using Clamav in a Django back-end

Question

Working on a React/Django app. I have files being uploaded by users through the React front-end that end up in the Django/DRF back-end. We have antivirus (AV) running on the server constantly, but we want to add stream scanning before it is written to disk.

It is a bit over my head as how to set it up. Here are a few sources I am looking at.

How do you virus scan a file being uploaded to your java webapp as it streams?

Although accepted best answer describes it being "... quite easy" to setup, I'm struggling.

I apparently need to cat testfile | clamscan - per the post and the corresponding documentation:

How do you virus scan a file being uploaded to your java webapp as it streams?

So if my back-end looks like the following:

class SaveDocumentAPIView(APIView):
    permission_classes = [IsAuthenticated]

    def post(self, request, *args, **kwargs):

        # this is for handling the files we do want
        # it writes the files to disk and writes them to the database
        for f in request.FILES.getlist('file'):
            max_id = Uploads.objects.all().aggregate(Max('id'))
            if max_id['id__max'] == None:
                max_id = 1
            else:    
                max_id = max_id['id__max'] + 1
            data = {
                'user_id': request.user.id,
                'sur_id': kwargs.get('sur_id'),
                'co': User.objects.get(id=request.user.id).co,
                'date_uploaded': datetime.datetime.now(),
                'size': f.size
            }
            filename = str(data['co']) + '_' + \
                    str(data['sur_id']) + '_' + \
                    str(max_id) + '_' + \
                    f.name
            data['doc_path'] = filename
            self.save_file(f, filename)
            serializer = SaveDocumentSerializer(data=data)
            if serializer.is_valid(raise_exception=True):
                serializer.save()
        return Response(status=HTTP_200_OK)

    # Handling the document
    def save_file(self, file, filename):
        with open('fileupload/' + filename, 'wb+') as destination:
            for chunk in file.chunks():
                destination.write(chunk)

I think I need to add something to the save_file method like:

for chunk in file.chunks():
    # run bash comman from python
    cat chunk | clamscan -
    if passes_clamscan:
        destination.write(chunk)
        return HttpResponse('It passed')
    else:
        return HttpResponse('Virus detected')

So my issues are:

1) How to run the Bash from Python?

2) How to receive a result response from the scan so that it can be sent back to the user and other things can be done with the response on the back-end? (Like creating logic to send the user and the admin an email that their file had a virus).

I have been toying with this, but not much luck.

Running Bash commands in Python

Furthermore, there are Github repos out there that claim to marry Clamav with Django pretty well, but they either haven't been updated in years or the existing documentation is pretty bad. See the following:

https://github.com/vstoykov/django-clamd

https://github.com/musashiXXX/django-clamav-upload

https://github.com/QueraTeam/django-clamav

It's unlikely that a chunk of a file is going to be detectable as a virus. The scanner is likely to need the entire file. — Douglas Leeder, May 24 '18 at 07:59

score 4 · Accepted Answer · answered May 24 '18 at 19:06

Ok, got this working with clamd. I modified my SaveDocumentAPIView to the following. This scans the files before they are written to disk and prevents them from being written if they infected. Still allows uninfected files through, so the user doesn't have to re-upload them.

class SaveDocumentAPIView(APIView):
    permission_classes = [IsAuthenticated]

    def post(self, request, *args, **kwargs):

        # create array for files if infected
        infected_files = []

        # setup unix socket to scan stream
        cd = clamd.ClamdUnixSocket()

        # this is for handling the files we do want
        # it writes the files to disk and writes them to the database
        for f in request.FILES.getlist('file'):
            # scan stream
            scan_results = cd.instream(f)

            if (scan_results['stream'][0] == 'OK'):    
                # start to create the file name
                max_id = Uploads.objects.all().aggregate(Max('id'))
                if max_id['id__max'] == None:
                    max_id = 1
                else:    
                    max_id = max_id['id__max'] + 1
                data = {
                    'user_id': request.user.id,
                    'sur_id': kwargs.get('sur_id'),
                    'co': User.objects.get(id=request.user.id).co,
                    'date_uploaded': datetime.datetime.now(),
                    'size': f.size
                }
                filename = str(data['co']) + '_' + \
                        str(data['sur_id']) + '_' + \
                        str(max_id) + '_' + \
                        f.name
                data['doc_path'] = filename
                self.save_file(f, filename)
                serializer = SaveDocumentSerializer(data=data)
                if serializer.is_valid(raise_exception=True):
                    serializer.save()

            elif (scan_results['stream'][0] == 'FOUND'):
                send_mail(
                    'Virus Found in Submitted File',
                    'The user %s %s with email %s has submitted the following file ' \
                    'flagged as containing a virus: \n\n %s' % \
                    (
                        user_obj.first_name, 
                        user_obj.last_name, 
                        user_obj.email, 
                        f.name
                    ),
                    'The Company <no-reply@company.com>',
                    ['admin@company.com']
                )
                infected_files.append(f.name)

        return Response({'filename': infected_files}, status=HTTP_200_OK)

    # Handling the document
    def save_file(self, file, filename):
        with open('fileupload/' + filename, 'wb+') as destination:
            for chunk in file.chunks():
                destination.write(chunk)

How has this implementation been working for you so far? Would you recommend this method for scanning images / PDFs? — bones225, Dec 18 '19 at 20:03
It works great and haven't had any issues! We've only had a few files that were attempted to be uploaded that were infected. Shoots us an email when it happens so we can reach out to them. — cjones, Dec 19 '19 at 15:02
That's great to hear. I'm working with just PDFs right now so I am using PDFiD (but thinking about adding ClamAV). Why do you think there is not a more highly rated / active python package for integrating ClamCV? It seems like something many webapps should have -- I'm surprised I had to dig this hard to find a post in the last 3 years discussing it. — bones225, Dec 19 '19 at 18:06

Setting up a file upload stream scan using Clamav in a Django back-end

1 Answers1