1

I was using streamlit for my api before and after file was submitted I got BytesIO, which I could successfully run in my python script (PDFMiner, tessseract-ocr, etc) I, however, had to switch to FastAPI and lost in terms of what data type I am receiving

@app.post("/file_upload", response_model=ResponseModel)
async def upload(pdf: UploadFile = File(...)) -> ResponseModel:
    if pdf is not None:
        text = TextExtraction.extract_text(pdf)

TextExtraction:

ef extract_text(pdf):
    try:
        text = extract_text(pdf.file)
        text = re.sub(r'[^A-Za-z0-9 :,@$€+.\n-()-]+', '', text)
        texts = texts + '\n' + text
    except:
        pass

I know how to make script work with BytesIO, but UploadFile class is new to me and I am confused, having found nothing of much help online. Can someone either help me convert TempFile to bytesIO or help adjust my PDFMiner/Pytesseract-ocr to accommodate TempFile.

I tried switching UploadFile to bytes class, also tried using filename (but since its TempFile it is not saved and I guess there isnt a path)

Chris
  • 18,724
  • 6
  • 46
  • 80
  • See related answers [here](https://stackoverflow.com/a/70665801/17865804), [here](https://stackoverflow.com/a/74165295/17865804), [here](https://stackoverflow.com/a/71886990/17865804), [here](https://stackoverflow.com/a/70653605/17865804), as well as [here](https://stackoverflow.com/a/71766160/17865804), [here](https://stackoverflow.com/a/73811351/17865804) and [here](https://stackoverflow.com/a/70657621/17865804). – Chris Nov 26 '22 at 17:47
  • `extract_text(pdf.file)` This function is called `extract_text()`. Doesn't that mean it will just call itself forever and forever? `except: pass` Don't do this. Ignoring exceptions means that your program will be harder to debug. The exception is there to help you figure out what's going wrong. – Nick ODell Nov 26 '22 at 17:53
  • thank you, Chris. Found my answer there! Although it wasn't the Pandas DF one. Also thank you, Nick, I'll keep that in mind! – Sin of Greed Nov 28 '22 at 12:11

0 Answers0