Read a file .txt and write in a new .txt file only the rows that start with string values

Question

I did a script to compare two files, count the elements and say how much times they appear. This information is saved in a new file. This last file, unfortunately, contains numbers and words. I need only the rows that starts with words in general (strings).

The initial code is this:

f1 = open("file1.txt", 'r')
f2 = open("file2.txt", 'r')




words1 = f1.read().split()
words2 = f2.read().split()
words = set(words1) & set(words2)


with open('outfile.txt', 'w') as output:
    for word in words:
        output.write('{} appears {} times in f1 and {} times in f2.\n'.format(word, words1.count(word), words2.count(word)))

The file out is made by this kind of text and I need only when starts with a word, e.g. ACTION for this lines:

ACTION appears 1 times in f1 and 1 times in f2.
1150.00 appears 3 times in f1 and 1 times in f2.
1.18233875e-05 appears 1 times in f1 and 1 times in f2.
2.52229049e-09-1.85248240e-13 appears 1 times in f1 and 1 times in f2.
8.85017800e-09-1.22652064e-12-1.37945792e+04 appears 1 times in f1 and 1 times in f2.

Does this answer your question? [How can I check if a string only contains letters in Python?](https://stackoverflow.com/questions/18667410/how-can-i-check-if-a-string-only-contains-letters-in-python) — mkrieger1, May 20 '22 at 10:06
How about 'words' like `'1tel'` or `'ready123'` - would you want to exclude anything that contains numbers, or only things that are numbers in their entirety? — Grismar, May 20 '22 at 10:08
Dear @Grismar thank you for your question. I would like to include also the ones like 1tel or ready123 — batardavelo, May 20 '22 at 10:35

score 1 · Answer 1 · edited May 20 '22 at 10:18

1

You can use isalpha() to check if something is a letter. Just check the first character of your string

f1 = open("file1.txt", 'r')
f2 = open("file2.txt", 'r')

words1 = f1.read().split()
words2 = f2.read().split()
words = set(words1) & set(words2)

with open('outfile.txt', 'w') as output:
    for word in words:
        if(word[0].isalpha()):
            output.write('{} appears {} times in f1 and {} times in f2.\n'.format(word, words1.count(word), words2.count(word)))

edited May 20 '22 at 10:18

Mortz

4,654
1
19
35

answered May 20 '22 at 10:11

Bosco Hidalgo

21
6

Thank you @Bosco Hidalgo, very useful, it was my fault. – batardavelo May 20 '22 at 10:45

score 0 · Answer 2 · answered May 20 '22 at 12:01

User @boscohidalgo suggest checking if the text is all text with isalpha, but really, you want to know if the text is not numeric. After all, you want to include text like 'ready123', which returns False for 'ready123'.isalpha().

This would work:

if not word.isnumeric():

However, this doesn't catch something like '-100' or 1,000.00. Since you probably want to exclude those, you may want to simply try and convert it to a floating point number and ignore it unless that fails:

try:
   float(word)
except:
   # do something

In your code:

with open("file1.txt", 'r') as f1, open("file2.txt", 'r') as f2:
    words1 = f1.read().split()
    words2 = f2.read().split()
    words = set(words1) & set(words2)

with open('outfile.txt', 'w') as output:
    for word in words:
        try:
            float(word)
        except:
            output.write(f'{word} appears {words1.count(word)} times in f1 and {words2.count(word)} times in f2.\n')

There's something to be said about the speed of various methods, if you need to consider that. This question goes into detail about alternatives for this simple approach: How do I check if a string is a number (float)?

Read a file .txt and write in a new .txt file only the rows that start with string values

2 Answers2