I have a data frame (df). The Data frame contains a string column called: supported_cpu. The (supported_cpu) data is a string type separated by a comma. I want to use this data for the ML model.
I had to get unique values for the column (supported_cpu). The output is a (list) of unique values.
def pars_string(df,col):
#Separate the column from the string using split
data=df[col].value_counts().reset_index()
data['index']=data['index'].str.split(",")
# Create a list including all of the items, which is separated by column
df_01=[]
for i in range(data.shape[0]):
for j in data['index'][i]:
df_01.append(j)
# get unique value from sub_df
list_01=list(set(df_01))
# there are some leading or trailing spaces in the list_01 which need to be deleted to get unique value
list_02=[x.strip(' ') for x in list_01]
# get unique value from list_02
list_03=list(set(list_02))
return(list_03)
supported_cpu_list = pars_string(df=df,col='supported_cpu')
The output:
I want to map this output to the data frame to encode it for the ML model.
How could I store the output in the data frame? Note : Some row have a multi-value(more than one CPU)
Input: string type separated by a column output: I did not know what it should be.
Input: string type separated by a column output: I did not know what it should be.