0

I am trying to use pandas to read a column in an excel file and print a new column using my input. I am trying to convert 3-letter code to 1-letter code. So far, I've written this code, but when I run it, it will not print anything in the last column.

import pandas as pd
df = pd.read_csv (r'C:\Users\User\Documents\Research\seqadv.csv') 
print (df)

codes = []
for i in df['WT_RESIDUE']:
   if i == 'ALA':
    codes.append('A')
   if i == 'ARG':
    codes.append('R')
   if i == 'ASN':
    codes.append('N')
   if i == 'ASP':
    codes.append('D')
   if i == 'CYS':
    codes.append('C')
   if i == 'GLU':
    codes.append('E')
    print (codes)
codes = df ['MUTATION_CODE']
df.to_csv(r'C:\Users\User\Documents\Research\seqadv3.csv')

excel headers

BigBen
  • 46,229
  • 7
  • 24
  • 40
jack22321
  • 75
  • 6
  • 1
    The assignment codes = df ['MUTATION_CODE'] should be reversed. It is better to use the apply method of the dataframe. – Coconut Apr 26 '21 at 20:34
  • Still won't show anything in the last column. I don't know how to apply dataframe and read from an excel file – jack22321 Apr 26 '21 at 20:41
  • No loop: `m = {'ALA': 'A', 'ARG': 'R', 'ASN': 'N', 'ASP': 'D', 'CYS': 'C', 'GLU': 'E'}`, `df['MUTATION_CODE'] = df['WT_RESIDUE'].map(m).fillna('')`. – BigBen Apr 26 '21 at 20:42
  • Console says: "SyntaxError: cannot assign to dict display" What does this mean? – jack22321 Apr 26 '21 at 20:48
  • I see you are using my sript :). I think you need to write: df['mulation_code'] = codes – John Mommers Apr 26 '21 at 20:51

2 Answers2

1

The way to do this is to define a dictionary with your replacement values, and then use either map() or replace() on your existing column to create your new column. The difference between the two is that

  • replace() will not change values not in the dictionary keys
  • map() will replace any values not in the dictionary keys with the dictionary's default value (if it has one) or with NaN (if the dictionary doesn't have a default value)
df = pd.DataFrame(data={'WT_RESIDUE':['ALA', 'REMARK', 'VAL', 'CYS', 'GLU']})

codes = {'ALA':'A', 'ARG':'R', 'ASN':'N', 'ASP':'D', 'CYS':'C', 'GLU':'E'}

df['code_m'] = df['WT_RESIDUE'].map(codes)
df['code_r'] = df['WT_RESIDUE'].replace(codes)


In: df
Out: 
  WT_RESIDUE code_m  code_r
0        ALA      A       A
1     REMARK    NaN  REMARK
2        VAL    NaN     VAL
3        CYS      C       C
4        GLU      E       E

More detailed information is here: Remap values in pandas column with a dict

JJL
  • 168
  • 2
  • 8
  • I tried this, but it doesn't replace MUTATION_CODE when I run the code. The column remains empty – jack22321 Apr 27 '21 at 03:19
  • Hi, I removed the line ```df = pd.DataFrame(data={'WT_RESIDUE':['ALA', 'REMARK', 'VAL', 'CYS', 'GLU']})``` and it worked – jack22321 Apr 28 '21 at 17:18
0

Write:

df['MUTATION_CODE'] = codes
John Mommers
  • 140
  • 7