1

I would like to extend the question: splitting a column by delimiter pandas python

import pandas as pd

df = {'V': ['IGHV7-B*01','IGHV7-B*01','IGHV6-A*01','GHV6-A*01','IGHV6-A*01','IGHV6-A*01','IGHV4- L*03','IGHV4-L*03','IGHV5-A*01','IGHV5-A*04','IGHV6-A*02','IGHV6-A*02']}

Now, I would like to only keep the new names:

df[['Name','allele']] = df['V'].str.split('-',expand=True)

But the df stores "V" too:

df 

    V           Name    allele
0   IGHV7-B*01  IGHV7   B*01
1   IGHV7-B*01  IGHV7   B*01

... Is there a handy key for doing that? I know I can do:

df.drop(columns='V', axis=1, inplace=True)

I would prefer a key instead of another line of code, as in my project, I have to repeat the same thing several times and I have a total of 25 names there.

Peer Breier
  • 361
  • 2
  • 13
  • Possible duplicate of [Splitting a column in dataframe using str.split function](https://stackoverflow.com/questions/57463127/splitting-a-column-in-dataframe-using-str-split-function) – Trenton McKinney Oct 21 '19 at 17:29

1 Answers1

2

you can create a new dataframe and useDataFrame.rename:

new_df=df['V'].str.split('-',expand=True).rename(columns={0:'Name',1:'Allete'})
print(new_df)

     Name allele
0   IGHV7   B*01
1   IGHV7   B*01
2   IGHV6   A*01
3    GHV6   A*01
4   IGHV6   A*01
5   IGHV6   A*01
6   IGHV4   L*03
7   IGHV4   L*03
8   IGHV5   A*01
9   IGHV5   A*04
10  IGHV6   A*02
11  IGHV6   A*02

if you do not want to create a new dataframe and what you want is to save it in the original dataframe and delete 'V' in a single line you can use pd.concat

df=pd.concat([df.loc[:,~df.columns.isin(['V','allete','Name'])],df['V'].str.split('-',expand=True).rename(columns={0:'Name',1:'allele'})],axis=1)
ansev
  • 30,322
  • 5
  • 17
  • 31
  • Good idea! In my case, I would like to avoid having to specify the names again, because I have 25 of them. – Peer Breier Oct 21 '19 at 17:20
  • What names do you mean? – ansev Oct 21 '19 at 17:29
  • I mean those: columns={0:'Name',1:'allele'} – Peer Breier Oct 21 '19 at 17:31
  • If you save the data in already created columns you will have to specify the label of the column where you save it. And therefore you will have to write it. As in the case that you have shown in your question. Similarly if you create new columns you will have to specify the name. In other words, if you want the columns to have that name in some way you will have to specify it. Are you going to apply this operation on more columns than V? – ansev Oct 21 '19 at 17:39
  • The "problem" is that I save it in this format: problem = { 'num_vars': 25, 'names': ['Name1', 'Name2', '...], 'bounds': [[1, 2],[...]] } – Peer Breier Oct 21 '19 at 17:44