0

I'm developing a Streamlit application where users can upload PDF documents. The uploaded documents are stored in st.session_state.pdfs. Once the user uploads the documents and presses a button, the application processes the documents. Here's the relevant part of the code:

def elaborate_documents(embeddings_type):
    st.session_state.pdf_elaborated_flag = True
    with st.spinner("Training in progress.."):
        chunks, titles = get_pdf_text(st.session_state.pdfs)
        # Documents list
        st.subheader("Your documents:")
        if st.session_state.pdfs == None:
            st.write("No document uploaded")
        else:
            store_name = st.session_state.pdfs[0].name[:-4]
            st.write(f'{titles}')
            # Embed the chunks
            VectorStore = get_text_embedding(store_name, chunks, embeddings_type)
            # create conversation chain
            st.session_state.conversation = get_conversiont_chain(VectorStore)

...
def main():
    embeddings_type = "SentenceTransformer-embeddings"

    if "pdfs" not in st.session_state:
        st.session_state.pdfs = None

...
with st.sidebar:
    st.session_state.pdfs = st.file_uploader("Drag or press Browse files here to upload your documents:", type="pdf", accept_multiple_files=True)
    if st.button("Process the documents", type="primary", on_click=elaborate_documents(embeddings_type)):
        pass

The issue I'm facing is that, at startup, Streamlit throws an IndexError: list index out of range error pointing to the line store_name = st.session_state.pdfs[0].name[:-4]. I believe this error happens because st.session_state.pdfs is empty at startup, but I'm confused as this line of code is supposed to be executed only when the user clicks a button.

The complete error is

IndexError: list index out of range
Traceback:
File "/home/fd/.venv/pdfchat/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
File "/home/fd/repo/PDFchat/app.py", line 151, in <module>
    main()
File "/home/fd/repo/PDFchat/app.py", line 145, in main
    if st.button("Process the documents", type="primary", on_click=elaborate_documents(embeddings_type)):
                                                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/fd/repo/PDFchat/app.py", line 95, in elaborate_documents
    store_name = st.session_state.pdfs[0].name[:-4]
                 ~~~~~~~~~~~~~~~~~~~~~^^^

So, my question is: why is this error thrown at startup, before the user has the chance to upload any documents and press the button? Is the function elaborate_documents(embeddings_type) executed at startup? How can I prevent the execution of this function until the user presses the button?

Full code in this repo: Github PDFchat repo

I attempted to address the issue by moving the elaborate_documents(embeddings_type) function call outside of the on_click event of the button, so that the function is directly tied to the button press. However, the behavior I observed was different. I found that the user has to click the button twice for the function to execute. I'm not experienced with web development, sorry for ask dumb things.

  • Please trim your code to make it easier to find your problem. Follow these guidelines to create a [minimal reproducible example](https://stackoverflow.com/help/minimal-reproducible-example). – Community Jul 12 '23 at 12:56

1 Answers1

0

The problem is that the function callback elaborate_document is called in your code when the if is evaluated. This is because you used parentheses with an argument inside. You should only put the function name:

st.button("Process the documents", type="primary", on_click=elaborate_documents)

If you need to pass arguments on_click, please refer to this post, which is not for streamlit, but the principles remain the same.

matleg
  • 618
  • 4
  • 11