I was using streamlit for my api before and after file was submitted I got BytesIO, which I could successfully run in my python script (PDFMiner, tessseract-ocr, etc) I, however, had to switch to FastAPI and lost in terms of what data type I am receiving
@app.post("/file_upload", response_model=ResponseModel)
async def upload(pdf: UploadFile = File(...)) -> ResponseModel:
if pdf is not None:
text = TextExtraction.extract_text(pdf)
TextExtraction:
ef extract_text(pdf):
try:
text = extract_text(pdf.file)
text = re.sub(r'[^A-Za-z0-9 :,@$€+.\n-()-]+', '', text)
texts = texts + '\n' + text
except:
pass
I know how to make script work with BytesIO, but UploadFile class is new to me and I am confused, having found nothing of much help online. Can someone either help me convert TempFile to bytesIO or help adjust my PDFMiner/Pytesseract-ocr to accommodate TempFile.
I tried switching UploadFile to bytes class, also tried using filename (but since its TempFile it is not saved and I guess there isnt a path)