1

I am trying to write a script to parse the Ncbi BLAST report. The column that is causing this error is the genome GI number.

E.g. LT697097.1

There is a decimal at the end. When i try to split this and just get the GI number, I get this error.

Django AttributeError 'float' object has no attribute 'split' tells me that this error is because split assumes that it is a float value.

So, I used the advice from Pandas reading csv as string type to import the pandas column as string.

I am using column number as the report doesn't automatically have column names.

import pandas as pd    
df = pd.read_csv("out.txt", sep="\t", dtype=object, names = ['query id','subject ids','query acc.ver','subject acc.ver','% identity','alignment length', 'mismatches','gap opens','q.start','q.end','s.start','s.end','evalue','bit score'])

sacc = df['subject acc.ver']
sacc = [i.split('.',1)[0] for i in sacc]

I still get the error AttributeError: 'float' object has no attribute 'split'.

I then tried astype(str) as suggested by Convert Columns to String in Pandas.

This fails to read the column, and only has the columns names attribute as the output value.

Can you please advice me where I'm going wrong in my approach?

1 Answers1

0

I think you need str.split with selecting first list which working with NaNs very nice. Another problem should be some values without .:

df['subject acc.ver'] = df['subject acc.ver'].str.split('.',1).str[0]

Sample:

df = pd.DataFrame({'subject acc.ver':['LT697097.1',np.nan,None, 'LT6']})

df['subject acc.ver'] = df['subject acc.ver'].str.split('.',1).str[0]
print (df)
  subject acc.ver
0        LT697097
1             NaN
2            None
3             LT6
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252