Write a function that reads a file and returns all of the numbers in it as a list of floats

Question

I have a large text file containing many thousand lines but a short example that covers the same idea is:

vapor dust -2C pb 
x14 71 hello! 42.42
100,000 lover baby: -2

there is a mixture of integers, alphanumerics, and floats.

ATTEMPT AT SOLN. Ive done this to create a single list composed of strings, but I am unable to isolate each cell based on if its numeric or alphanumeric

with open ('file.txt','r') as f:
data = f.read().split()
#dirty = [ x for x in data if x.isnumeric()]
print(data)

The line #dirty fails.

I have had luck constructing a list-of-lists containing almost all required values using the code as follows:

with open ('benzene_SDS.txt','r') as f:  
    for word in f:
        data= word.split()
        clean = [ x for x in data if x.isnumeric()]            
        res = list(set(data).difference(clean))
        print(clean)

But It doesnt return a single list, it a list of lists, most of which are blank [].

There was a hint given, that using the "try" control statement is useful in solving the problem but I dont see how to utilize it.

Any help would be greatly appreciated! Thanks.

From your own example data, what do you expect the output to be? I.e. should 14 be included? And how about 100,000? — Grismar, Jan 17 '22 at 07:20
@Grismar 14 is not included but the 100 would be , from the assignment "The function should identify numbers like "10823," that have a comma or other character after them # REQ2: Numbers with hyphens (or other non-numeric characters) within them like x14 or 727-8989 should be skipped. " — Himi Chan, Jan 17 '22 at 07:44
Hi if it is any easier, we can assume that we only need the true integers and floating numbers in the example provided. I am mostly confused as to incorporation of the try statement. Sorry if my knowledge is bad! It is still new to me — Himi Chan, Jan 17 '22 at 07:49
You should probably add an example of the expected output to your question, like I asked. For example, `'100,000'` would be considered a valid way to write `100000` for many regional settings, while for other regional settings, it might be considered `100.000`. It sounds like you only want entirely numeric values that comply with local regional settings, but values can be separated by both spaces and other separators like commas - it's unclear what would be valid separators though. How about `'123; 45-50, 60!'`? — Grismar, Jan 17 '22 at 07:49
@Grismarit the prompt states the function should be able to identify numbers like "2019," that have a comma or other character after them. And that numbers with non-numeric characters within them should be skipped. So for your example, the function would return [123.0,60.0] — Himi Chan, Jan 17 '22 at 07:53

score 0 · Answer 1 · answered Jan 17 '22 at 07:21

0

numbers = []
with open('file.txt','r') as f:
    for line in f.read():
        words = line.split()
        numbers.extend([word for word in words if word.isnumeric()])

# Print all numbers
print(numbers)

# Print all unique numbers
print(set(numbers))

# Print all unique numbers, converted to floats
print([float(n) for n in set(numbers)])

If you specifically need a list then you can wrap the set with list().

answered Jan 17 '22 at 07:21

liveware

72
5

Note that `'100,000'` is not numeric, according to `.isnumeric`; it's unknown if OP wants numbers like `14` included, but of course that would be missed as well. – Grismar Jan 17 '22 at 07:24
Hi this is close but I need the full number value not their individual components. Such as 71 or 42.42 as in the example – Himi Chan Jan 17 '22 at 07:46

Grismar · Accepted Answer · 2022-01-17T07:55:55.167

If you're mainly asking how one would use try to check for validity, this is what you're after:

values = []
with open ('benzene_SDS.txt','r') as f:  
    for word in f.read().split():
        try:
            values.append(float(word))
        except ValueError:
            pass
print(values)

Output:

[71.0, 42.42, -2.0]

However, not that this does not parse '100,000' as either 100 or 100000.

This code would do that:

import locale

locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')

values = []
with open('benzene_SDS.txt', 'r') as f:
    for word in f.read().split():
        try:
            values.append(locale.atof(word))
        except ValueError:
            pass

print(values)

Result:

[71.0, 42.42, 100000.0, -2.0]

Note that running the same code with this:

locale.setlocale(locale.LC_ALL, 'nl_NL.UTF-8')

Yields a different result:

[71.0, 4242.0, 100.0, -2.0]

Since the Netherlands use , as a decimal separator and . as a thousands separator (which basically just gets ignored in 42.42)

If this answers your question, consider ticking the checkmark to turn it green, to indicate your question no longer requires additional answers. — Grismar, Jan 17 '22 at 07:56
However, note that this solution does not deal with interpunction, so numbers followed by other characters than `.` or `,` (or whatever is locally accepted) would still be ignored, i.e. numbers follow by a question mark or exclamation point. You would likely need a regular expression to parse numbers from that, but it would be substantially more challenging. — Grismar, Jan 17 '22 at 07:58

Write a function that reads a file and returns all of the numbers in it as a list of floats

2 Answers2