0

I did a script to compare two files, count the elements and say how much times they appear. This information is saved in a new file. This last file, unfortunately, contains numbers and words. I need only the rows that starts with words in general (strings).

The initial code is this:

f1 = open("file1.txt", 'r')
f2 = open("file2.txt", 'r')




words1 = f1.read().split()
words2 = f2.read().split()
words = set(words1) & set(words2)


with open('outfile.txt', 'w') as output:
    for word in words:
        output.write('{} appears {} times in f1 and {} times in f2.\n'.format(word, words1.count(word), words2.count(word)))

The file out is made by this kind of text and I need only when starts with a word, e.g. ACTION for this lines:

ACTION appears 1 times in f1 and 1 times in f2.
1150.00 appears 3 times in f1 and 1 times in f2.
1.18233875e-05 appears 1 times in f1 and 1 times in f2.
2.52229049e-09-1.85248240e-13 appears 1 times in f1 and 1 times in f2.
8.85017800e-09-1.22652064e-12-1.37945792e+04 appears 1 times in f1 and 1 times in f2.
mkrieger1
  • 19,194
  • 5
  • 54
  • 65
  • Does this answer your question? [How can I check if a string only contains letters in Python?](https://stackoverflow.com/questions/18667410/how-can-i-check-if-a-string-only-contains-letters-in-python) – mkrieger1 May 20 '22 at 10:06
  • How about 'words' like `'1tel'` or `'ready123'` - would you want to exclude anything that contains numbers, or only things that are numbers in their entirety? – Grismar May 20 '22 at 10:08
  • Dear @Grismar thank you for your question. I would like to include also the ones like 1tel or ready123 – batardavelo May 20 '22 at 10:35

2 Answers2

1

You can use isalpha() to check if something is a letter. Just check the first character of your string

f1 = open("file1.txt", 'r')
f2 = open("file2.txt", 'r')

words1 = f1.read().split()
words2 = f2.read().split()
words = set(words1) & set(words2)

with open('outfile.txt', 'w') as output:
    for word in words:
        if(word[0].isalpha()):
            output.write('{} appears {} times in f1 and {} times in f2.\n'.format(word, words1.count(word), words2.count(word)))
Mortz
  • 4,654
  • 1
  • 19
  • 35
0

User @boscohidalgo suggest checking if the text is all text with isalpha, but really, you want to know if the text is not numeric. After all, you want to include text like 'ready123', which returns False for 'ready123'.isalpha().

This would work:

if not word.isnumeric():

However, this doesn't catch something like '-100' or 1,000.00. Since you probably want to exclude those, you may want to simply try and convert it to a floating point number and ignore it unless that fails:

try:
   float(word)
except:
   # do something

In your code:

with open("file1.txt", 'r') as f1, open("file2.txt", 'r') as f2:
    words1 = f1.read().split()
    words2 = f2.read().split()
    words = set(words1) & set(words2)

with open('outfile.txt', 'w') as output:
    for word in words:
        try:
            float(word)
        except:
            output.write(f'{word} appears {words1.count(word)} times in f1 and {words2.count(word)} times in f2.\n')

There's something to be said about the speed of various methods, if you need to consider that. This question goes into detail about alternatives for this simple approach: How do I check if a string is a number (float)?

Grismar
  • 27,561
  • 4
  • 31
  • 54