1

How do I tell map() to selectively convert only some of the strings (not all the strings) within a list to integer values?

Input file (tab-delimited):

abc1    34    56
abc1    78    90  

My attempt:

import csv

with open('file.txt') as f:
    start = csv.reader(f, delimiter='\t')
    for row in start:
        X = map(int, row)
        print X

Error message: ValueError: invalid literal for int() with base 10: 'abc1'

When I read in the file with the csv module, it is a list of strings:

['abc1', '34', '56']
['abc1', '78', '90']

map() obviously does not like 'abc1'even though it is a string just like '34' is a string.

I thoroughly examined Convert string to integer using map() but it did not help me deal with the first column of my input file.

Community
  • 1
  • 1
warship
  • 2,924
  • 6
  • 39
  • 65

3 Answers3

3
def safeint(val):
   try:
      return int(val)
   except ValueError:
      return val

for row in start:
    X = map(safeint, row)
    print X

is one way to do it ... you can step it up even more

from functools import partial
myMapper = partial(map,safeint)
map(myMapper,start)
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • Which would be considered more "Pythonic" (and why): this answer or Roberto Bonvallet's answer? – warship Aug 15 '14 at 05:23
  • 1
    @white_rabbit: It really depends on your data. If you're writing code specifically to deal with a format with one string and a bunch of ints, or with ints specifically in columns 1 and 2 only, etc., Roberto's answer (or mine, which is pretty much equivalent, and later; the only difference is whether they mutate the list) is better because it reflects that structure. But if you're writing code to work on a variety of different formats, each of which has an arbitrary collection of strings and ints, you _need_ Joran's answer, so it's obviously better than the others. – abarnert Aug 18 '14 at 17:34
2

Map only the part of the list that interests you:

row[1:] = map(int, row[1:])
print row

Here, row[1:] is a slice of the list that starts at the second element (the one with index 1) up to the end of the list.

Roberto Bonvallet
  • 31,943
  • 5
  • 40
  • 57
  • Could you please explain why using `row[1:]` twice doesn't overwrite itself or get into some funky infinite loop? – warship Aug 15 '14 at 05:18
  • 1
    @XYZ927: It helps to understand if you look at how indexing is actually implemented. That line of code is equivalent to `row.__setitem__(slice(1, None), map(int, row.__getitem__(slice(1, None))))`. No infinite loop, it's just passing a slice object meaning "everything from 1 to the end" to `__getitem__`, so `__getitem__` returns a copy of everything from 1 to the end; then, later, it's passing the same slice object to `__setitem__`, so `__setitem__` replaces everything from 1 to the end with the result of the `map` call. – abarnert Aug 15 '14 at 10:33
  • What @abarnert says is correct, but a more general way to understand it is realizing that, in an assigment, first the expression at the right side of the `=` sign is evaluated, and once the value is obtained (i.e. there's no `row[1:]`, only the final result), only then the interpreter looks at the left-hand sign to see where to store the result. – Roberto Bonvallet Aug 18 '14 at 15:02
1

I like Roberto Bonvallet's answer, but if you want to do things immutably, as you're doing in your question, you can:

import csv

with open('file.txt') as f:
    start = csv.reader(f, delimiter='\t')
    for row in start:
        X = [row[0]] + map(int, row[1:])
        print X

… or…

numeric_cols = (1, 2)

X = [int(value) if col in numeric_cols else value 
     for col, value in enumerate(row])

… or, probably most readably, wrap that up in a map_partial function, so you can do this:

X = map_partial(int, (1, 2), row)

You could implement it as:

def map_partial(func, indices, iterable):
    return [func(value) if i in indices else value 
            for i, value in enumerate(iterable)]

If you want to be able to access all of the rows after you're done, you can't just print each one, you have to store it in some kind of structure. What structure you want depends on how you want to refer to these rows later.

For example, maybe you just want a list of rows:

rows = []
with open('file.txt') as f:
    for row in csv.reader(f, delimiter='\t'):
        rows.append(map_partial(int, (1, 2), row))
print('The second column of the first row is {}'.format(rows[0][1]))

Or maybe you want to be able to look them up by the string ID in the first column, rather than by index. Since those IDs aren't unique, each ID will map to a list of rows:

rows = {}
with open('file.txt') as f:
    for row in csv.reader(f, delimiter='\t'):
        rows.setdefault(row[0], []).append(map_partial(int, (1, 2), row))
print('The second column of the first abc1 row is {}'.format(rows['abc1'][0][1]))
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • Your first code snippet throws an error: `TypeError: cannot concatenate 'str' and 'list' objects` – warship Aug 15 '14 at 05:15
  • You're welcome. Since this file was read in row by row, do you know of a way to grab certain values in some specific row (e.g., how would I grab the value 34)? `row[1]` just seems to grab the whole column 1, but not the specific entry in that column... – warship Aug 15 '14 at 06:48
  • @XYZ927: The loop you've written (and all the answers have followed, including mine) just deals with one row at a time. If you want to be able to go back to a specific row, you're going to need to store the rows in a list, instead of just printing them out. Then `rows[0]` is the first row, and `rows[0][1]` is the second column in the first row (that `34`). I'll edit the answer in case this isn't clear. – abarnert Aug 15 '14 at 10:22