I am trying to speed up a function to be performed on a dataframe. I originally used iterrows
but its definitely slower so I used apply
. It definitely improved but I would like to use np.vectorize
for better performance. My problem is how do I pass the columns of the dataframe to the function (assume the number of columns or names of the columns can vary). How can I pass columns then iterate through them. I guess I can use *args on the function parameter but how to I pass and break the columns of the df? I also dont want to make copies of the df (assume the df is extremely large). Hope my question is clear
For example lets say I just want to print the data into some format like so:
data = [ {"a": str(x), "b": x, "c": x} for x in range(10)]
df = pd.DataFrame(data)
def func(row):
print(f"{row.to_json()}:", end="")
_ = df.apply(func, axis=1)
This was much faster than doing iterrows, but how can I improve this further? If this is the example case, also assume number of columns and its name can vary.