I'm basically trying to one hot encode a column with values like this:
tickers
1 [DIS]
2 [AAPL,AMZN,BABA,BAY]
3 [MCDO,PEP]
4 [ABT,ADBE,AMGN,CVS]
5 [ABT,CVS,DIS,ECL,EMR,FAST,GE,GOOGL]
...
First I got all the set of all the tickers(which is about 467 tickers):
all_tickers = list()
for tickers in df.tickers:
for ticker in tickers:
all_tickers.append(ticker)
all_tickers = set(all_tickers)
Then I implemented One Hot Encoding this way:
for i in range(len(df.index)):
for ticker in all_tickers:
if ticker in df.iloc[i]['tickers']:
df.at[i+1, ticker] = 1
else:
df.at[i+1, ticker] = 0
The problem is the script runs incredibly slow when processing about 5000+ rows. How can I improve my algorithm?