0

I have a textfile containing words, numbers, and characters. I want to delete all lines with the characters and words, and keep the lines with numbers. I found out that all those lines with words and characters have the letter of "r". so I wrote my code as:

The textfile contains these lines as an example:

-- for example
-- 7 Febraury 2022
5 7 1 5 3.0 2
3*2 3 5 7.0 3

and I want to keep these 2 lines:

5 7 1 5 3.0 2
3*2 3 5 7.0 3

This is the code written: textfile = open('test.txt', 'r') A = textfile.readlines()

L = []
for index,name in enumerate(A):
    if 'r' in name:
        L.append(index)

for idx in sorted(L, reverse = True):
    del A[idx]

I know it is not a good way to do that, is there any suggestion to do that?

pymn
  • 171
  • 2
  • 10

3 Answers3

1

you can find only the words using regex

import re
with open(r'text_file.txt', 'r') as f:
    data = f.readlines()

with open(r'text_file.txt', 'w') as f:
    for line in data:
        if re.findall(r"(?!^\d+$)^.+$", line):
            f.write(line)
Tal Folkman
  • 2,368
  • 1
  • 7
  • 21
  • thanks for your reply. I didn't know about regex. Can we exclude some letters or characters from that? For example, if there is an * in the lines, it keeps the line but for other characters, it deletes the lines? – pymn Feb 07 '22 at 07:00
  • of course! you can add what you want to the regex, if you need help doing this just say ;) @PeymanBahrami – Tal Folkman Feb 07 '22 at 07:05
  • Actually yes, I need a little bit help on this. I am reading the documents about regex. it says to negate a character or letter use hyphen. but I dont understand where to use that. in the same line? – pymn Feb 07 '22 at 07:18
  • you need to use it in the regex. I recommend you to look at this site - https://regexr.com/ to try some regex. if you need more help, tell me the problem – Tal Folkman Feb 07 '22 at 07:22
  • I updated the question for more details. sure, thank you very much. I think I need to read more about it. – pymn Feb 07 '22 at 07:29
  • This Regex is so complex. your code now returns an empty list. I searched your code: (?!...) Negative lookahead assertion (^$ ) finds the patterns I dont understand when they are in combination – pymn Feb 07 '22 at 08:40
1

If you want to do this without importing anything (e.g., re) then you could do this:

keep_these = []

def is_valid(t):
    try:
        float(t.replace('*', '0'))
        return True
    except ValueError:
        pass
    return False

with open('test.txt', encoding='utf-8') as infile:
    for line in infile:
        if all(is_valid(t) for t in line.strip().split()):
            keep_these.append(line)

print(keep_these)

Thus the keep_these list will contain references to the lines you want to keep which you could, for example, use to re-write the file

DarkKnight
  • 19,739
  • 3
  • 6
  • 22
  • thank you for your reply. in my textfile there are some lines having both numbers and words. I want to delete them as well. The problem with this code is that it keeps those lines. – pymn Feb 07 '22 at 07:10
  • That is **NOT** what you asked for in your question. I quote: "I want to delete all lines with the characters and words, and keep the lines with numbers" – DarkKnight Feb 07 '22 at 07:21
  • Thank you Olvin, your code works very well on my example. If you dont get mad at me, may I ask your help for another part. my text file is so big. so I couldnt bring it here. In some lines I have the numbers like 3.0. your code could not exclude such a format and considers the line to be deleted. – pymn Feb 07 '22 at 08:48
  • Answer edited to allow for new information about the input/output requirements – DarkKnight Feb 07 '22 at 14:14
0

You can use the regex library re. One way to do that is to loop through the lines and then keep the line only if re.match("[^0-9 ]", line) == None.

vjh
  • 115
  • 3