I'm developing a Streamlit application where users can upload PDF documents. The uploaded documents are stored in st.session_state.pdfs. Once the user uploads the documents and presses a button, the application processes the documents. Here's the relevant part of the code:
def elaborate_documents(embeddings_type):
st.session_state.pdf_elaborated_flag = True
with st.spinner("Training in progress.."):
chunks, titles = get_pdf_text(st.session_state.pdfs)
# Documents list
st.subheader("Your documents:")
if st.session_state.pdfs == None:
st.write("No document uploaded")
else:
store_name = st.session_state.pdfs[0].name[:-4]
st.write(f'{titles}')
# Embed the chunks
VectorStore = get_text_embedding(store_name, chunks, embeddings_type)
# create conversation chain
st.session_state.conversation = get_conversiont_chain(VectorStore)
...
def main():
embeddings_type = "SentenceTransformer-embeddings"
if "pdfs" not in st.session_state:
st.session_state.pdfs = None
...
with st.sidebar:
st.session_state.pdfs = st.file_uploader("Drag or press Browse files here to upload your documents:", type="pdf", accept_multiple_files=True)
if st.button("Process the documents", type="primary", on_click=elaborate_documents(embeddings_type)):
pass
The issue I'm facing is that, at startup, Streamlit throws an IndexError: list index out of range error pointing to the line store_name = st.session_state.pdfs[0].name[:-4]. I believe this error happens because st.session_state.pdfs is empty at startup, but I'm confused as this line of code is supposed to be executed only when the user clicks a button.
The complete error is
IndexError: list index out of range
Traceback:
File "/home/fd/.venv/pdfchat/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
exec(code, module.__dict__)
File "/home/fd/repo/PDFchat/app.py", line 151, in <module>
main()
File "/home/fd/repo/PDFchat/app.py", line 145, in main
if st.button("Process the documents", type="primary", on_click=elaborate_documents(embeddings_type)):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/fd/repo/PDFchat/app.py", line 95, in elaborate_documents
store_name = st.session_state.pdfs[0].name[:-4]
~~~~~~~~~~~~~~~~~~~~~^^^
So, my question is: why is this error thrown at startup, before the user has the chance to upload any documents and press the button? Is the function elaborate_documents(embeddings_type) executed at startup? How can I prevent the execution of this function until the user presses the button?
Full code in this repo: Github PDFchat repo
I attempted to address the issue by moving the elaborate_documents(embeddings_type) function call outside of the on_click event of the button, so that the function is directly tied to the button press. However, the behavior I observed was different. I found that the user has to click the button twice for the function to execute. I'm not experienced with web development, sorry for ask dumb things.