1

I'm trying to extract a floating point number from a line of text in a text file. I tried using the removeprefix() method, but that keeps giving me an attribute error as you can see below.

inp = input("Enter a filename: ")
var = open(inp)
prefix = "X-DSPAM-Confidence: "
num = 0
count = 0
for line in var:
    if line.startswith(prefix):
        suffix = line.removeprefix(prefix)
        num = num + float(suffix)
        count = count + 1
print(num/count)

This is the error message:

Traceback (most recent call last):
  File "C:/Users/Leonard/OneDrive/Documents/Python projects/FileReader_LineFinder/main.py", line 8, in <module>
    suffix = line.removeprefix(prefix)
AttributeError: 'str' object has no attribute 'removeprefix'

Here is what the text file looks like:

From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
Return-Path: <postmaster@collab.sakaiproject.org>
Received: from murder (mail.umich.edu [141.211.14.90])
     by frankenstein.mail.umich.edu (Cyrus v2.3.8) with LMTPA;
     Sat, 05 Jan 2008 09:14:16 -0500
X-Sieve: CMU Sieve 2.3
Received: from murder ([unix socket])
     by mail.umich.edu (Cyrus v2.2.12) with LMTPA;
     Sat, 05 Jan 2008 09:14:16 -0500
Received: from holes.mr.itd.umich.edu (holes.mr.itd.umich.edu [141.211.14.79])
    by flawless.mail.umich.edu () with ESMTP id m05EEFR1013674;
    Sat, 5 Jan 2008 09:14:15 -0500
Received: FROM paploo.uhi.ac.uk (app1.prod.collab.uhi.ac.uk [194.35.219.184])
    BY holes.mr.itd.umich.edu ID 477F90B0.2DB2F.12494 ; 
     5 Jan 2008 09:14:10 -0500
Received: from paploo.uhi.ac.uk (localhost [127.0.0.1])
    by paploo.uhi.ac.uk (Postfix) with ESMTP id 5F919BC2F2;
    Sat,  5 Jan 2008 14:10:05 +0000 (GMT)
Message-ID: <200801051412.m05ECIaH010327@nakamura.uits.iupui.edu>
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
Received: from prod.collab.uhi.ac.uk ([194.35.219.182])
          by paploo.uhi.ac.uk (JAMES SMTP Server 2.1.3) with SMTP ID 899
          for <source@collab.sakaiproject.org>;
          Sat, 5 Jan 2008 14:09:50 +0000 (GMT)
Received: from nakamura.uits.iupui.edu (nakamura.uits.iupui.edu [134.68.220.122])
    by shmi.uhi.ac.uk (Postfix) with ESMTP id A215243002
    for <source@collab.sakaiproject.org>; Sat,  5 Jan 2008 14:13:33 +0000 (GMT)
Received: from nakamura.uits.iupui.edu (localhost [127.0.0.1])
    by nakamura.uits.iupui.edu (8.12.11.20060308/8.12.11) with ESMTP id m05ECJVp010329
    for <source@collab.sakaiproject.org>; Sat, 5 Jan 2008 09:12:19 -0500
Received: (from apache@localhost)
    by nakamura.uits.iupui.edu (8.12.11.20060308/8.12.11/Submit) id m05ECIaH010327
    for source@collab.sakaiproject.org; Sat, 5 Jan 2008 09:12:18 -0500
Date: Sat, 5 Jan 2008 09:12:18 -0500
X-Authentication-Warning: nakamura.uits.iupui.edu: apache set sender to stephen.marquard@uct.ac.za using -f
To: source@collab.sakaiproject.org
From: stephen.marquard@uct.ac.za
Subject: [sakai] svn commit: r39772 - content/branches/sakai_2-5-x/content-impl/impl/src/java/org/sakaiproject/content/impl
X-Content-Type-Outer-Envelope: text/plain; charset=UTF-8
X-Content-Type-Message-Body: text/plain; charset=UTF-8
Content-Type: text/plain; charset=UTF-8
X-DSPAM-Result: Innocent
X-DSPAM-Processed: Sat Jan  5 09:14:16 2008
X-DSPAM-Confidence: 0.8475
X-DSPAM-Probability: 0.0000

Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772

Author: stephen.marquard@uct.ac.za
Date: 2008-01-05 09:12:07 -0500 (Sat, 05 Jan 2008)
New Revision: 39772

Modified:
content/branches/sakai_2-5-x/content-impl/impl/src/java/org/sakaiproject/content/impl/ContentServiceSqlOracle.java
content/branches/sakai_2-5-x/content-impl/impl/src/java/org/sakaiproject/content/impl/DbContentService.java
Log:
SAK-12501 merge to 2-5-x: r39622, r39624:5, r39632:3 (resolve conflict from differing linebreaks for r39622)

The expected output is an average of all numbers preceded by 'X-DSPAM-Confidence:' you can see this line in the text file snippet.

How can I fix my code to produce the correct result?

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • 3
    Please edit your post to show a representative snippet of the file and expected output. `removeprefix` is not a function in your version of Python (it was introduced in 3.9). Can you verify your version? Thanks. – ggorlen Jul 05 '21 at 16:50
  • I edited the post to include a text file snippet and underneath is a explanation of the expected output. Is there any other way I can extract that number without using the `removeprefix()` method. –  Jul 05 '21 at 17:52

2 Answers2

0

removeprefix method was introduced in Python 3.9 last year. Please check the version of Python you are using to execute your file.

Chandral
  • 448
  • 1
  • 3
  • 19
  • is there anyway other method I can use on python 3.8.8 instead of `removeprefix()` –  Jul 05 '21 at 17:57
0

removeprefix is in Python 3.9+ only. You can upgrade or implement it as (with some subtle differences to the actual implementation):

def removeprefix(prefix, s):
    return s[len(prefix):] if s.startswith(prefix) else s

Assuming the lines always follow the format prefix: float, you could split() the line and pop() off the last element from the list:

prefix = "X-DSPAM-Confidence: "
num = 0
count = 0

with open("foo.txt") as f:
    for line in f:
        if line.startswith(prefix):
            count += 1
            num += float(line.split().pop())

print(num / count)

Note the with open(...) which closes the file resource automatically at the end of the block, avoiding a memory leak.

If this isn't precise enough, you can collect a decimal number from the line with regex:

>>> import re
>>> line = "X-DSPAM-Confidence: 0.8475"
>>> re.findall(r"\b\d*\.\d+\b", line)
['0.8475']
>>> # or one result:
>>> re.search(r"\b\d*\.\d+\b", line).group()
'0.8475'
ggorlen
  • 44,755
  • 7
  • 76
  • 106