0

I am uploading a file from the front end and trying to read it in the backend to do some data extraction from that. I have written the following code which is failing in all scenarios

Views.py

class UserInfo(View):

    template_name = "Recruit/recruit.html"

    def get(self, request):
        user = UserInformationFrom()
        return render(request, self.template_name, {"form": user})

    def post(self, request):
        user = UserInformationFrom(request.POST, request.FILES)
        output = dict()
        HTMLExtensionList = ['.html','.htm']
        if user.is_valid():
            savedUser = user.save()
            filename = user['file'].data.name
            name, extension = os.path.splitext(filename)
            if extension.lower() in HTMLExtensionList:
                output = readHTML(filename=user['file'].data)
            savedUser.email = output['Email']
            savedUser.mobile = output['Phone']
            savedUser.Zipcode = output['zipCode']
            savedUser.state = output['state']
            savedUser.upload_by = request.user
            savedUser.updated = timezone.now()
            savedUser.save()
            return render(request, self.template_name, {"form": user})
        else:
            return render(request, self.template_name, {"form": user})

DataExtract.py

def readHTML(filename):
    with open(filename, "r", encoding='utf-8') as file:
        soup = BeautifulSoup(file)
        for data in soup(['style', 'script']):
            data.decompose()
        var = ' '.join(soup.stripped_strings)
    email = ExtractEmail(var)
    phone = findPhone(var)
    zipCode = extractZipCode(var)
    state = extractState(var)
    return {"Email": email, "Phone": phone, "zipCode": zipCode, "state": state}

I am getting the following error

expected str, bytes or os.PathLike object, not InMemoryUploadedFile

I am getting errors in DataExtract where I am trying to open the file. I tried this solution still not working

expected str, bytes or os.PathLike object, not InMemoryUploadedFile

Wasim
  • 7
  • 7
  • By the way: you'll have a better time using a `FormView` to process a form rather than doing it by hand. – AKX Mar 08 '22 at 12:50

2 Answers2

0

Well, since your readHTML function expects a filename, you'd need to pass it one, not just the file.

Refactor readHTML to a function that can read its input from just a string:

def read_html_string(s):
    soup = BeautifulSoup(s)
    for data in soup(["style", "script"]):
        data.decompose()
    var = " ".join(soup.stripped_strings)
    email = ExtractEmail(var)
    phone = findPhone(var)
    zipCode = extractZipCode(var)
    state = extractState(var)
    return {"Email": email, "Phone": phone, "zipCode": zipCode, "state": state}

# If you still need this for something...
def readHTML(filename):
    with open(filename, "r", encoding="utf-8") as file:
        return read_html_string(file.read())

Then just do

output = read_html_string(user['file'].data.read())

in your view function.

AKX
  • 152,115
  • 15
  • 115
  • 172
  • I have already tried both solutions but still not working. for the First solution BeautifulSoup not reading Any value from the document and for the second solution, returns `No such file or directory: b''` error – Wasim Mar 08 '22 at 13:01
  • What... both solutions? This is a single solution in two parts. – AKX Mar 08 '22 at 13:09
  • sorry for my mistake. if I am executing the `read_html_string` function BeautifulSoap read ''. I am not able to get the data contained in the file. – Wasim Mar 08 '22 at 13:19
  • You may need to add `user['file'].data.seek(0)` before that invocation. – AKX Mar 08 '22 at 13:25
  • still not working. but I tried for PDF and Docx it work fine but for them I used `docx` and `pdfplumber` modules. I am facing issues only with when I tried to read file directly – Wasim Mar 08 '22 at 14:29
0

Try to pass the InMemoryUploadedFile directly to the BeautifulSoup class like this:

def readHTML(file):
    soup = BeautifulSoup(file)
    for data in soup(['style', 'script']):
        data.decompose()
    var = ' '.join(soup.stripped_strings)
    email = ExtractEmail(var)
    phone = findPhone(var)
    zipCode = extractZipCode(var)
    state = extractState(var)
    return {"Email": email, "Phone": phone, "zipCode": zipCode, "state": state}

obviously the error comes from this line: with open(filename, "r", encoding='utf-8') as file so you might not need to call open to be able to read the file

source: https://tutorialmeta.com/question/expected-str-bytes-or-os-pathlike-object-not-inmemoryuploadedfile

movileanuv
  • 350
  • 1
  • 12