0

I'd really appreciate some help on this.

    import glob as glob
import pandas as pd

files = glob.glob('CASTp_Total/**/*.pocInfo')

pdb = pd.read_excel("C:/Users/User/Documents/Research/6 - CASTp/CASTp-outputs-v3.xlsx")
code = pdb['PDB code']
long = pdb['CASTp job name (1.4A)']
res = {long[i]: code[i] for i in range(len(long))}

for file in files:
    df = pd.read_csv(file , sep ='\t') # if only the first sheet is needed.   
    df['PDB'].map(res)
    df.to_csv(out, sep = '\t') 

Basically, I've created a dictionary to map over current string in a dataframe. When I run the python script, I end up w/ the same original values, and mapping does not occur. I'm grabbing the dictionary from a very long excel files with too many values to put in this post.

The shape
  • 359
  • 3
  • 10
Yo Mama
  • 3
  • 3
  • Since folks don't have your dataset, could you post a [simple reproducible version of it in the post itself](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)? Also, please show the expected/actual output. Thanks. – ggorlen Jun 02 '21 at 19:56

1 Answers1

0

The issue is due to df['PDB'].map(res). Instead of changing the "PDB" column in the existing df dataframe, your code returns a new dataframe object. Hence, when you do df.to_csv(out, sep = '\t'), you still refer to the original, unaltered dataframe.

To solve this, you could replace df['PDB'].map(res) by df['PDB'] = df['PDB'].map(res). The loop would then look like this:

for file in files:
    df = pd.read_csv(file , sep ='\t')
    df['PDB'] = df['PDB'].map(res)
    df.to_csv(out, sep = '\t') 
Georgy Kopshteyn
  • 678
  • 3
  • 13