writing variables in pandas dataframe with header

Question

i have a script that runs api calls agains a flask app. i want to create a pandas datafram with the statuscode and the elapsed time of the request which i can write to a csv file. My problem is that only one entry ends up being in the csv file and i dont know why. the headers should be "statuscode" and "elapsed time". when i am printing the statuscode and elapsedtime variables every response is printed and not only one

with this csv file i want to create a grap to visualize the responstimes

i tried to write the "write_df" fuction but ended up using the variables from the requests in the "send_api_request" function.

import requests
import datetime
import concurrent.futures
import csv
import pandas as pd

HOST = 'http://127.0.0.1:5000'
API_PATH = '/'
ENDPOINT = HOST + API_PATH
MAX_THREADS = 8
CONCURRENT_THREADS = 10

csv_path = "flasktests.csv"
try:
    file = open(csv_path, 'w', newline='')
    writer = csv.writer(file)
except:
    print("error opening or writing to the CSV file!")


def send_api_request():
    try:
        #print ('Sending API request: ', ENDPOINT)
        r = requests.get(ENDPOINT)
        if r.status_code == 200:
            #print('Received: ', r.status_code, r.elapsed)
            responses = {"statuscode":[r.status_code], "elapsed time": [r.elapsed]}
            statuscode = r.status_code
            elapsedtime = r.elapsed
            print(statuscode, elapsedtime)
            df = pd.DataFrame([statuscode,elapsedtime], columns=["statuscode","elapsed time"])
            df.to_csv(csv_path, index=False)

        elif r.status_code == 417:
            print('Received error code:', r.status_code, r.json())

    except Exception as e:
        print("error",str(e))
    

def write_df(statuscode, elapsedtime):
    print(statuscode,elapsedtime)
    df = pd.DataFrame({"statuscode":[statuscode], "elapsed time": [elapsedtime]})
    df.to_csv(csv_path, index=False)
    print(df)

with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_THREADS) as executor:
    futures = [executor.submit(send_api_request) for x in range (CONCURRENT_THREADS)]
    executor.shutdown(wait=True)

any ideas what i am doing wrong here? Thank you!

score 0 · Accepted Answer · answered Oct 14 '22 at 14:06

df = pd.DataFrame([statuscode,elapsedtime], columns=["statuscode","elapsed, "elapsed time": [elapsedtime]})  
df.to_csv(csv_path, index=False)

Here, you are writing a csv with only one entry, because your dataframe df only has one row. And everytime this function is being executed, you will overwrite the existing csv. This is why the output file only contains one line (2 if you count the header, which should be there)

I see you are trying to use multithreading, which is probably not needed. If you do want to keep using it, you will need to make sure that you are not writing to the file with two threads at the same time, which is a lot of overhead. Instead, I would suggest you do something like this:

def send_api_request():
    try:
        #print ('Sending API request: ', ENDPOINT)
        r = requests.get(ENDPOINT)
        if r.status_code == 200:
            #print('Received: ', r.status_code, r.elapsed)
            return r.status_code, r.elapsed

        elif r.status_code == 417:
            print('Received error code:', r.status_code, r.json())
            return r.status_code, None

    except Exception as e:
        print("error",str(e))

# number of times you want to call the API
nb_api_calls = 10
codes = []
elapsed_times = []
for i in range(nb_api_calls):
    code, elapsed_time = send_api_request()
    if elapsed_time is None:
        # An error happened, choose what do do with that information
        # Here I am just skipping it
        pass
    else:
        codes.append(code)
        elapsed_times.append(elapsed_time)
df = pd.DataFrame({"statuscode": codes,"elapsed time": elapsed_times})
df.to_csv(csv_path, index=False)

Note that, with the current configuration, the "statuscode" column will only contain the value 200 many times

thank you! the script worked. as you mentioned there are only 200 codes in the csv file. do you know how i can catch any other error? because in some of my requests i am getting this one ```error HTTPConnectionPool(host='127.0.0.1', port=5000): Max retries exceeded with url: / (Caused by NewConnectionError(': Failed to establish a new connection: [WinError 10048]``` which i want to capture aswell. — YannickMetz, Oct 14 '22 at 15:28
and i need to keep the multithreading part because i want to do some kind of performance testing which simulates user requests on this flask app — YannickMetz, Oct 15 '22 at 10:42
In the code I included, if the status code is not `200`, we are not returning a `elapsed_time`, but afterwards, in the code that write to the csv, we are ignoring all requests that don't have an `elapsed_time`. In your use case, maybe you don't have to check the code of the response from the server, but can directly `return r.status_code, r.elapsed_time` so you can get all the statistics you want — Florent Monin, Oct 17 '22 at 08:16
Regarding the multithreading, you can make it work if you get the result from each thread and then write it to the dataframe, and later to the csv. See [this question](https://stackoverflow.com/q/6893968/20121320) for instance in how to get a return value from a thread — Florent Monin, Oct 17 '22 at 08:18

writing variables in pandas dataframe with header

1 Answers1