There is a very popular answer on stackoverflow that you should not iterate over Pandas' dataframes:
https://stackoverflow.com/a/55557758/11826257
In my case, I want to take values from two columns of a dataframe and create a list of SQL INSERT INTO...
statements with them. Like this:
import pandas as pd
df = pd.DataFrame({'velocity':[12,10,15], 'color':['blue','green','yellow']})
mylist = list()
for index, row in df.iterrows():
mylist.append('INSERT INTO mytable(velocity, color) VALUES (' + \
str(row['velocity']) + \
', "' + \
str(row['color']) + \
'");' )
[print(x) for x in mylist]
# INSERT INTO mytable(velocity, color) VALUES (12, "blue");
# INSERT INTO mytable(velocity, color) VALUES (10, "green");
# INSERT INTO mytable(velocity, color) VALUES (15, "yellow");
I understand that I could write something like this:
[mylist.append('INSERT INTO mytable(velocity) VALUES ('+ str(x) + ');') for x in df["velocity"]]
if I were only interested in one column. But is that what is meant by "vectorization"? And how would it apply to a case where you need two items from each row of a pandas' dataframe?