I am attempting to split a column and keep only the third item as the column value using the following
df1['gene_name'] = df1.loc[:,'gene_name'].str.split(';', expand=True)[2]
I have also tried these variations
df1['gene_name'] = df1.iloc[:,'gene_name'].str.split(';', expand=True)[2]
df1['gene_name'] = df1.loc[:,'gene_name'].str.split(';', expand=True)[2]
df1['gene_name'] = df1['gene_name'].str.split(';', expand=True)[2]
df1['gene_name'] = df1.gene_name.str.split(';', expand=True)[2]
But it always returns this warning
find_target_genes.py:19: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df1['gene_name'] = df1.loc[:,'gene_name'].str.split(';', expand=True)[2]
I have also tried using 4 (column index) instead of gene_names but this results in an error.
How can I make this work? I've looked through the documentation but I don't think I am fully understanding it since I can't figure out whats wrong.
Here is an example of 2 of the columns I am trying to split (yes this is all in one column):
ID "A" ; version "B" ; name "C" ; source "D' ; transcript "C"
ID "A1" ; version "B1" ; name "C1" ; source "D1" ; transcript "C1"
I would like the column to say name "C"
only and get rid of the rest