I have a dataframe called data, a column of which contains strings. I want to extract the characters from the strings because my goal is to one-hot encode them and make the usable for classification. The column containing the strings is stored in predictors as follows:
predictors = pd.DataFrame(data, columns = ['Sequence']).to_numpy()
The result upon printing is:
[['DKWL']
['FCHN']
['KDQP']
...
['SGHC']
['KIGT']
['PGPT']]
,while my goal is to get somehing like:
[['D', 'K', 'W', 'L']
...
['P', 'G', 'P, 'T']]
which from my understanding is a more appropriate form for one-hot encoding.
I have already tried answers provided here How do I convert string characters into a list? or here How to create a list with the characters of a string? to no success.
Specifically, I also tried this:
for row in predictors:
row = list(row)
but the result is in the same form as predictors, i.e.
[['DKWL']
['FCHN']
['KDQP']
...
['SGHC']
['KIGT']
['PGPT']]