Substituting missing values in Python

Question

I want to substitute missing values (None) with the last previous known value. This is my code. But it doesn't work. Any suggestions for a better algorithm?

t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
def treat_missing_values(table):
    for line in table:
        for value in line:
            if value == None:
                value = line[line.index(value)-1]
    return table

print treat_missing_values(t)

What do you mean by "last previous known value"? Do you mean the previous non-None value in the current list ("line")? What if the every previous value in the current list is None? Use the last value of the previous list ("line")? And what if the very first value is None? — Jim DeLaHunt, Jan 25 '12 at 02:55
No, just keeping the None if there is no previous value in the line. The data is separate. I saw the problem but didn't want to complicate the question. — Randomtheories, Jan 25 '12 at 03:08
**See also:** https://stackoverflow.com/questions/20248355/how-to-get-python-to-gracefully-format-none-and-non-existing-fields — dreftymac, Oct 29 '17 at 09:42
**See also:** https://stackoverflow.com/questions/35574349/python-format-string-with-custom-delimiters — dreftymac, Jan 15 '20 at 20:00

score 4 · Answer 1 · answered Jan 25 '12 at 02:57

4

This is probably how I'd do it:

>>> def treat_missing_values(table):
...     for line in table:
...         prev = None
...         for i, value in enumerate(line):
...             if value is None:
...                 line[i] = prev
...             else:
...                 prev = value
...     return table
... 
>>> treat_missing_values([[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]])
[[1, 3, 3, 5, 5], [2, 2, 2, 3, 1], [4, 4, 2, 1, 1]]
>>> treat_missing_values([[None, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]])
[[None, 3, 3, 5, 5], [2, 2, 2, 3, 1], [4, 4, 2, 1, 1]]

answered Jan 25 '12 at 02:57

senderle

145,869
36
209
233

1

Can't say I like both changing the input and returning it. – Michael Lorton Jan 25 '12 at 03:10
1

@Malvolio, I agree. But I decided to stick to the input/output convention established by the question. – senderle Jan 25 '12 at 03:22
5

@Randomtheories, note that it's typical to _either_ return a new object _or_ alter an object passed to a function, but not both. – senderle Jan 25 '12 at 03:23

Eric Fortin · Answer 2 · 2012-01-25T03:00:28.897

When you do an assignment in python, you are merely creating a reference on an object in memory. You can't use value to set the object in the list because you're effectively making value reference another object in memory.

To do what you want, you need to set directly in the list at the right index.

As stated, your algorithm won't work if one of the inner lists has None as the first value.

So you can do it like this:

t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
def treat_missing_values(table, default_value):
    last_value = default_value
    for line in table:
        for index in xrange(len(line)):
            if line[index] is None:
                line[index] = last_value
            else:
                last_value = line[index]
    return table

print treat_missing_values(t, 0)

Michael Lorton · Answer 3 · 2012-01-25T03:09:18.767

3

That thing about looking up the index from the value won't work if the list start with None or if there's a duplicate value. Try this:

def treat(v):
   p = None
   r = []
   for n in v:
     p = p if n == None else n
     r.append(p)
   return r

def treat_missing_values(table):
   return [ treat(v) for v in table ]

t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
print treat_missing_values(t)

This better not be your homework, dude.

EDIT A functional version for all you FP fans out there:

def treat(l):
  def e(first, remainder):
     return [ first ] + ([] if len(remainder) == 0 else e(first if remainder[0] == None else remainder[0], remainder[1:]))
  return l if len(l) == 0 else e(l[0], l[1:])

edited Jan 25 '12 at 03:09

answered Jan 25 '12 at 02:59

Michael Lorton

43,060
26
103
144

lol. No, not homework. I like that you have two functions instead of one. Very clean. Would you also structure the code like this when it's part of a class? – Randomtheories Jan 25 '12 at 03:12
Would I also structure the code like this when it's part of a (school) class? All the more so! You write code cleanly so it can be read by people; the computer doesn't actually care. Academic code *only exists* to be read. Really, it doesn't even have to work, so long as the person who reads the code understands what you were trying to do. – Michael Lorton Jan 25 '12 at 03:19

score 2 · Answer 4 · edited Jan 25 '12 at 04:45

That's because the index method returns the first occurence of the argument you pass to it. In the first line, for example, line.index(None) will always return 2, because that's the first occurence of None in that list.

Try this instead:

    def treat_missing_values(table):
        for line in table:
            for i in range(len(line)):
                if line[i] == None:
                    if i != 0:
                        line[i] = line[i - 1]
                    else:
                        #This line deals with your other problem: What if your FIRST value is None?
                        line[i] = 0 #Some default value here
        return table

Oops! left a stray variable in there. It's fixed now. I changed `value` to `i` — Joel Cornett, Jan 25 '12 at 03:02

Jim DeLaHunt · Answer 5 · 2012-01-25T06:52:19.173

1

I'd use a global variable to keep track of the most recent valid value. And I'd use map() for the iteration.

t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]

prev = 0
def vIfNone(x):
    global prev
    if x:
       prev = x
    else:
       x = prev
    return x

print map( lambda line: map( vIfNone, line ), t )

EDIT: Malvolio, here. Sorry to be writing in your answer, but there were too many mistakes to corrected in a comment.

if x: will fail for all falsy values (notably 0 and the empty string).
Mutable global values are bad. They aren't thread-safe and produce other peculiar behaviors (in this case, if a list starts with None, it is set to the last value that happened to be processed by your code.
The re-writing of x is unnecessary; prev always has the right value.
In general, things like this should be wrapped in functions, for naming and for scoping

So:

def treat(n):
    prev = [ None ]
    def vIfNone(x):
        if x is not None:
           prev[0] = x
        return prev[0]
    return map( vIfNone, n )

(Note the weird use of prev as a closed variable. It will be local to each invocation of treat, and global across all invocations of vIfNone from the same treat invocation, exactly what you need. For dark and probably disturbing Python reasons I don't understand, it has to be an array.)

edited Jan 25 '12 at 06:52

answered Jan 25 '12 at 03:15

Jim DeLaHunt

10,960
3
45
74

Good edits, Malvolio. Your modified version is much better than mine. One change: compare x using "is not None" instead of "!= None". Thanks! – Jim DeLaHunt Jan 25 '12 at 06:50
1

@Malvolio, it's not so dark and disturbing. Python distinguishes between local and global variables in part by assuming that if you try to rebind a variable name anywhere within a function, that variable name is local to the function. In Python 2, modifying a variable in the containing scope requires that you do as you have done, or else use a global. Python 3 adds the `nonlocal` keyword, so in Python 3, you could do `prev = None` and at the top of the definition of `vIfNone`, `nonlocal prev`. – senderle Jan 25 '12 at 14:27
@senderle -- well, it isn't dark but it is disturbing, at least in the sense that it definitively establishes that the Python experiment of not declaring local variables has failed. Between `global` and `nonlocal`, a huge fraction of variables have to be declared anyway, *and* you still have the problem of assigning to misspelled variables. At least, Python erred by assuming that undeclared variables are local; Javascript, by contrast, makes the horrifically wrong assumption that undeclared variables are global! – Michael Lorton Jan 25 '12 at 22:33
@Malvolio, ah, now I understand. It seems we disagree about some basic design decisions. I find that I spend far more time writing variable declarations in c than I spend fixing mispelling-related bugs in Python. But I guess that's just me. – senderle Jan 26 '12 at 20:08
@senderle -- that isn't a design decision, it's a contingent observation! But the correct comparison is: do you spend more time typing `var` in Javascript than fixing mispelling-related bugs in Python? Variable declarations in C represent the worst of both worlds: the inconvenience of declaration from the strictest type-checked languages with the risks of the unchecked scripting languages. Of course, since Python and Javascript aren't going to change any time soon, all this only matters to people who are writing *new* dynamic languages, who I suppose are out there, but rare. – Michael Lorton Jan 26 '12 at 23:49

score 0 · Answer 6 · answered Jan 25 '12 at 02:51

0

EDIT1

# your algorithm won't work if the line start with None
t = [[1, 3, None, 5, None], [2, None, None, 3, 1], [4, None, 2, 1, None]]
def treat_missing_values(table):
    for line in table:
        for index in range(len(line)):
            if line[index] == None:
                line[index] = line[index-1]
    return table

print treat_missing_values(t)

answered Jan 25 '12 at 02:51

lucemia

6,349
5
42
75

1

for index in range(len(line)) – RanRag Jan 25 '12 at 02:53
1

use `is` for comparisons against singletons – wim Jan 25 '12 at 02:59
xrange would be even better if he uses python 2.X – Eric Fortin Jan 25 '12 at 03:02

Substituting missing values in Python

6 Answers6

Linked