-1

How can I get a Python list from a text file with the following contents?

'hallo'
'hallo\n'
'\x00' * 1
100
'400 + 2'
400 + 2

For example:

ll = ["hallo", "hallo\n", "\x00", 100, 402, 402]

with the types:

[string, string, string, int, int, int]

Means, every string which python understands as int should be from type int.

I tried to use eval but it has difficulties with \n and \x00.

The user input (list of strings to convert) is assumed to be safe.

Moshe
  • 57,511
  • 78
  • 272
  • 425
mr.wolle
  • 1,148
  • 2
  • 13
  • 21
  • Do you only want to convert strings and numbers, or do you plant to eval any Python object? – Eric Duminil Apr 04 '17 at 20:55
  • 2
    How would you possibly decide what string stays a string and what gets evaluated? I.e. why does `'400 + 2'` become an evaluated number, how do you decide that? – Anyway, you need to write some smaller parser for this to detect what you want to do with the input. Once you have that, there shouldn’t be a problem evaluating the input according to your decision. The question as it stands is kind of too broad for SO. – poke Apr 04 '17 at 20:57
  • From the question you have stated, I agree with poke, it is a little to broad, and you should define what you want to do with the input for each case. It looks like its heading toward regular expression/s in a loop/iterator at the moment. Is that list exhaustive in terms of inputs? – PythonTester Apr 04 '17 at 21:01
  • You keep changing your input to the point where it’s no longer clear what you are trying to do. Please actually clarify what you are trying to do; not just what the results from your examples are supposed to do, but what actually happens with the input. – poke Apr 04 '17 at 21:50
  • Sorry for specializing the question multiple times. It should now be fine. – mr.wolle Apr 04 '17 at 21:56

3 Answers3

3

WARNING : Using eval is dangerous. Be very careful with it, or, better yet, find an alternative without.

That being said, you could define a regex to check if the string looks like something you'd like to eval. For example, anything with only numbers, spaces and mathematical operators could be deemed safe:

import re

l = ['hallo', 'hallo\n', '\x00' * 1, '100', 100, '400 + 2', '400 + - ', 400 + 2]


def string_or_expression(something):
    if isinstance(something, str):
        expression = re.compile('\A[\d\.\-\+\*\/ ]+\Z')
        if expression.match(something):
            try:
                return eval(something)
            except:
                return something
    return something

print([string_or_expression(s) for s in l])
# ['hallo', 'hallo\n', '\x00', 100, 100, 402, '400 + - ', 402]

With Python3, you might use ast.literal_eval, which might be a little less dangerous than a plain eval :

import re
import ast

l = ['hallo', 'hallo\n', '\x00' * 1, '100', 100, '400 + 2', '400 + - ', 400 + 2]


def string_or_expression(something):
    if isinstance(something,str):
      expression = re.compile('\A[\d\.\-\+\*\/ ]+\Z')
      if expression.match(something):
          try:
              return ast.literal_eval(something)
          except:
              return something
    return something

print([string_or_expression(s) for s in l])
# ['hallo', 'hallo\n', '\x00', 100, 100, 402, '400 + - ', 402]

Yet another alternative would be to use @poke's "expression evaluation algorithm", since literal_eval doesn't understand '2 * 3'.

Finally, even a "safe" expression like '2**2**2**2**2**2**2**2**2**2' could bring your server down.

Community
  • 1
  • 1
Eric Duminil
  • 52,989
  • 9
  • 71
  • 124
  • 1
    You don't actually need to evaluate the math by the time you've written that regex. You could declare that an expression that "looks like math" is probably a number. This kind of approximation is much safer than using eval, and well worth the tradeoff IMO. – kojiro Apr 04 '17 at 21:05
  • @kojiro: That would be nice, but how would you convert `'400 + 2'` to `402` then? – Eric Duminil Apr 04 '17 at 21:08
  • Ah, nice, OP changed the question on me. Or perhaps I misunderstood it from the get-go. I thought OP was asking for an interpretation of the types. – kojiro Apr 04 '17 at 21:21
  • Adding a comment *“Be very careful here!!!”* does not really do anything helpful. Once you use `eval` on user input, you have already lost and cannot be careful anymore. – poke Apr 04 '17 at 21:34
  • @poke: True. Still, I left the comment so that if OP plays with the solution and copy-pastes it somewhere else, that there's a clear reminder of a very dangerous method call. I'd be interested to know what the most dangerous expression could be with just digits and basic operators. – Eric Duminil Apr 04 '17 at 21:37
  • `string = '9 ** 9999999999999999999999999999999999999999'` – See you later, Python process. (Not to mention expressions that throw exceptions) – poke Apr 04 '17 at 21:42
  • @poke: Thanks for the tip about incorrect mathematical expressions, I didn't see your comment edit at first. – Eric Duminil Apr 04 '17 at 22:04
  • @EricDuminil The input strings are read from outside, e.g. from a text file. – mr.wolle Apr 04 '17 at 22:07
  • @EricDuminil I think yes: How will you get `'\x00'*1` to your list `l` ? – mr.wolle Apr 04 '17 at 22:11
  • @EricDuminil: Yes the input is `"'\x00' * 1" `. Just a string. – mr.wolle Apr 04 '17 at 22:18
  • _"I have a list of strings (the `'`'s are really there)"_ – mr.wolle Apr 04 '17 at 22:22
  • @EricDuminil No, why should they be needed ? It is just text. – mr.wolle Apr 04 '17 at 22:27
  • @mr.wolle: How should I know it's just text if there's no quote around? You have `'400 + 2'` and `400 + 2`. One is a string, the other an int. The same thing goes for `"'\x00' * 1"` and `'\x00' * 1`. One is a string, the other is a Python expression equal to `'\x00'`. It looks like you don't have a list of strings, but just a big string with all those expressions, written in a file. They're all strings, even though they might look like ints or expressions. It would have been nice to write it in the question. Anyway, it's late here. Good night. – Eric Duminil Apr 04 '17 at 22:31
  • @EricDuminil Sorry for the missing clarity. Yes, it is just a text. Not a python list with the values. – mr.wolle Apr 04 '17 at 22:40
0

how about:

 def try_eval(x):
    try:
        res=eval(x)
    except:
        res=x
    return res

[try_eval(x) for x in l]

output:

['hallo', 'hallo\n', '\x00', 100, 402]
Binyamin Even
  • 3,318
  • 1
  • 18
  • 45
0

Let's get serious about avoiding dangerous eval here >:)

import compiler

def is_math(expr):
    """Return True if the expression smells mathematical."""

    try:
        module = compiler.parse(expr)
        stmt, = module.getChildNodes()
        discard, = stmt.getChildNodes()
        code, = discard.getChildNodes()
        return not isinstance(code, compiler.ast.Name)
    except ValueError:
        return False
    except TypeError:
        return False

t = [eval(s) if is_math(s) else s for s in l]

Yes, I took a couple of assumptions here, but you can modify them to suit your needs as strictly as you really need. The AST is pretty easy to understand. When you do a parse, you get a Module. Inside the Module is a Statement. Inside that is (most likely) discard code (that just means it isn't being used anywhere).

If it isn't discard code, we assume it's a string. For one thing, this is likely to prevent any dangerous side effects from eval. (Someone prove me wrong here – wrap a dangerous expression in discard code.)

Inside that is the meat of your expression – from there I assume that anything that is a plain string will appear to be a name in the AST. Anything that isn't a name is probably a number or a math operation.

I think eval should be safe at this point, which is necessary if the expression is really math.

kojiro
  • 74,557
  • 19
  • 143
  • 201
  • OP needs more information than just `number` or `string`. Still, your method is interesting. – Eric Duminil Apr 04 '17 at 21:24
  • Updated, somewhat hesitantly. Given my new understanding of the question I don't think eval is avoidable. (Even if I unparsed the AST and executed it, the result would be essentially eval.) – kojiro Apr 04 '17 at 21:33
  • You could write your own expression evaluator, like I’ve [outlined in one of my answers](http://stackoverflow.com/a/20748308/216074). Then you’re completely safe on the evaluating side of the question (I still consider the detection part rather difficult/unclear). – poke Apr 04 '17 at 21:39
  • Seems one can defeat this rather easily using parentheses. I will have to try harder… – kojiro Apr 05 '17 at 21:26