0

I have a little python script which basically searches lines of a text file then returns a stripped version of the line back (with the numeric value I am looking for).

To do this, I get a user input (ui) which becomes the search I search each line of the text file for.

Problem I am having is that I can't seem to find a way to prevent it returning hits that is not a full word. For example, if the user searches "apple" I do not want it to return a line containing "applesauce".

I believe that one solution would be to convert the entire line to a list of individual words, then search the list for an exact hit. Would this be the best approach to take or is there a simple argument I can use somewhere which would be easier?

I'm sorry, my code is probably a mess to everyone else's eyes. I'm just a beginner with some basic VBA experience and now trying out python which seems to run MUCH faster for these tasks.

Thanks in advance!

#Ask for user input for variable name
print("Type variable name to be found:")    
ui = raw_input()

#use userinput as name of file to be written
write_file = ("C:\\temp\\" + ui + ".csv")

for i in cmd_line_args:
    with open(i) as dump:
        lines = dump.readlines()
        for line in lines:
            if ui.lower() in line.lower():
                line = line.replace(ui,"")
                line = line.replace("=","")

            b = ("abcdefghijklmnopqrstuvwxyz()?!£$:;@##_")
            for char in b:
                line = line.replace(char,"")
            line = line.replace(" ","")

            with open(write_file, "a") as f:
                f.write(line)
            print(line)

print("Operation complete, check " + write_file)

os.system('pause')

Heavily simplified sample data as requested:

Tested 18/01/10
USER mafs1f


ted       =     1.040864            Description
frm2      =     1.082459            Description
orm       =     0.4688  %         Description
orm2      =     -0.0469  %         Description
AFS       =     15.000  kg/h      Description
msjfg     =     7.500  kg/h      Description
msdg      =     7.500  kg/h      Description
EnvJ      =     978.00  hPa       Description
Engfh     =     1.9  degC      Description
pact      =     499.600  kPa       Description
mike_980
  • 23
  • 8
  • It seems you overcomplicating your task. Can you provide sample of data? Just several lines. – Alex Yu Jan 11 '19 at 01:52
  • I did a double take when I saw your `for char in b:` loop. It's a very unusual method for extracting content, to say the least. – Mad Physicist Jan 11 '19 at 02:05
  • @MadPhysicist Thanks, I can't take full credit, I borrowed it on a permanent basis from some other thread ;) – mike_980 Jan 11 '19 at 02:12
  • What is the search string that showcases your "apple" vs "applesauce" example? What output do you expect to see in your file for that string? – Mad Physicist Jan 11 '19 at 02:32
  • @MadPhysicist The output is simply the value such as 1.040864, these will be pulled from multiple files and stored in a single file so they can be easily plotted in excel – mike_980 Jan 11 '19 at 02:51

1 Answers1

2

You may need two modifications to your code. Firstly, try:

line = line.split(" ")

This ensures that the string is split into words assuming " " is your separator. If there are other separators, you may have to specify them iteratively to break down every substring in line

Secondly, you can use == operator to check for the word in the returned list.

Here is a small snippet.

>>> x = "apple applesause"
>>> x.split(" ")
['apple', 'applesause']
>>> x.split(" ")[0] == "apple" 
True
>>> x.split(" ")[1] == "apple" 
False

EDIT 1: With the specific file you shared, first read the file

>>> file = open("path/to/file", "r")   
>>> contents = file.read()

Then split the contents by line

>>> lines = contents.split("\n")
>>> line = lines[4].split(" ")
>>> line
['ted', '', '', '', '', '', '', '=', '', '', '', '', '1.040864', '', '', '', '', '', '', '', '', '', '', '', 'Description']

You can also clean the line by line.remove("", "anythingelseyouwantgone")

To check if a number is numeric you can use this function (note that this is not very elegant or efficient, thus not recommended on large lines)

>>> def is_number(s):
...     try:
...         float(s)
...         return True
...     except ValueError:
...         return False

Then you can check the line if the value is numeric:

>>> for i in line:
...     print( is_number(i))
... 
False
False
False
False
False
False
False
False
False
False
False
False
True
False
False
False
False
False
False
False
False
False
False
False
False
>>> 

So now you have to just return the numeric value when you hit True

newkid
  • 1,368
  • 1
  • 11
  • 27
  • Thanks for the reply, I assume if I use the split method like you suggest I would then just provide the index (is that the correct term?) of the list item I want to return to the output file? – mike_980 Jan 11 '19 at 02:16
  • Thank you so much, this is enough I think for me to make the changes I need. Thank you! – mike_980 Jan 11 '19 at 02:47
  • You can use `line.split()` instead of `line.split(' ')` to avoid dealing with all the empty strings. – Mad Physicist Jan 11 '19 at 02:55