Removing characters from a txt file using Python

Question

I'm writing a program in python that will request a user to input a file name, open the file, and count the number of M's and F's and tally it as a ratio. I can get it to do that, and remove whitespace, but I can't figure out how to remove characters that are not M or F. I want to remove all incorrect characters and write them in a new file. Here's what I have so far

fname = raw_input('Please enter the file name: ')  #Requests input from user
try:                                                #Makes sure the file input     is valid
   fhand = open(fname)
except:
   print 'Error. Invalid file name entered.'
   exit()
else:
  fhand = open(fname, 'r')            #opens the file for reading

  entireFile = fhand.read()           
  fhand.close()
  entireFile.split()           #Removes whitespace
  ''.join(entireFile)         #Rejoins the characters

  entireFile = entireFile.upper() #Converts all characters to capitals letters

  males = entireFile.count('M')
  print males
  females = entireFile.count('F')
  print females
  males = float(males)
  females = float(females)
  length = males + females
  print length
  length = float(length)
  totalMales = (males / length) * 1
  totalFemales = (females / length) * 1

  print "There are %", totalMales, " males and %", totalFemales, " in the file."

Why not iterate over the file contents once and perform an action for each character? For example, if the current character is M or F, add one to a variable. Else, remove it from the current file and append to a new file. — Jacob Bridges, Mar 31 '14 at 20:01
My suggestions for improving your code and understanding: 1) `split` does not remove all whitespace, just (effectively) the newlines. 2) You are opening `fhand` twice, which is redundant and I think may leave the original `fhand` open; see http://stackoverflow.com/questions/82831/how-do-i-check-if-a-file-exists-using-python for a solution that doesn't involve opening the file twice, or just figure out how to use the first `fhand` you create. 3) Putting the bulk of your code in a giant `else` is not necessary in this case since you `exit()` if you hit an exception. — Owen S., Mar 31 '14 at 20:13

Ammar · Answer 1 · 2014-03-31T20:17:25.820

the easiest way is to use regex:

import re
data = re.findall(r'[FM]', entirefile)

and if you use r'[FMfm]' you don't need to upper case all the file, the regex will catch all upper and lower case.

and this will return to you all the F's and M's , and no need to remove white spaces at all.

example:

entirefile = "MLKMADG FKFLJKASDM LKMASDLKMADF MASDLDF"
data = ['M', 'M', 'F', 'F', 'M', 'M', 'M', 'F', 'M', 'F']

and you can do whatever you want with this list.

hope this helps.

score 1 · Answer 2 · answered Mar 31 '14 at 20:04

1

m,f,other = [],[],[]
for ch in entierFile:
    if ch == "M":m.append(ch)
    elif ch == "F":f.append(ch)  
    else: other.append(ch)

print len(m) + " Males, "+len(f)+" Females"
print "Other:",other

answered Mar 31 '14 at 20:04

Joran Beasley

110,522
12
160
179

score 1 · Accepted Answer · answered Mar 31 '14 at 20:04

1

Use a regular expression to extract all characters that are not M or F:

import re
remainder = re.sub(r'M|F', '', entireFile)
with open('new_file', 'wb') as f:
    f.write(remainder)

answered Mar 31 '14 at 20:04

spinlok

3,561
18
27

Removing characters from a txt file using Python

3 Answers3

Linked