1

Among other things, my project requires the retrieval of distance information from file, converting the data into integers, then adding them to a 128 x 128 matrix.

I am at an impasse while reading the data from line.

I retrieve it with:

distances = []

with open(filename, 'r') as f:
    for line in f:
        if line[0].isdigit():
            distances.extend(line.splitlines())`

This produces a list of strings.

while

int(distances) #does not work

int(distances[0]) # produces the correct integer when called through console

However, the spaces foobar the procedure later on. An example of list:

['966']['966', '1513' 2410'] # the distance list increases with each additional city. The first item is actually the distance of the second city from the first. The second item is the distance of the third city from the first two. 

int(distances[0]) #returns 966 in console. A happy integer for the matrix. However:
int(distances[1]) # returns:

Traceback (most recent call last): File "", line 1, in ValueError: invalid literal for int() with base 10: '1513 2410'

I have a slight preference for more pythonic solutions, like list comprehension and the like, but in reality- any and all help is greatly appreciated.

Thank you for your time.

Nikki
  • 69
  • 1
  • 10
  • You seem to have added another quote after `'1513'` that isn't there in the `ValueError` output. – kojiro Mar 26 '14 at 20:53
  • related: [Python 3.3 code example that find the sum of all integers in a file](http://stackoverflow.com/a/20024735/4279) – jfs Mar 26 '14 at 21:09
  • Thank you for that approach, that's giving me a good idea for how to approach my matrix. After loading the distances into it, it should be able to give the distance for any two given cities. That looks good. – Nikki Mar 26 '14 at 21:41

2 Answers2

3

All the information you get from a file is a string at first. You have to parse the information and convert it to different types and formats in your program.

  • int(distances) does not work because, as you have observed, distances is a list of strings. You cannot convert an entire list to an integer. (What would be the correct answer?)
  • int(distances[0]) works because you are converting only the first string to an integer, and the string represents an integer so the conversion works.
  • int(distances[1]) doesn't work because, for some reason, there is no comma between the 2nd and 3rd element of your list, so it is implicitly concatenated to the string 1513 2410. This cannot be converted to an integer because it has a space.

There are a few different solutions that might work for you, but here are a couple of obvious ones for your use case:

distance.extend([int(elem) for elem in line.split()])

This will only work if you are certain every element of the list returned by line.split() can undergo this conversion. You can also do the whole distance list later all at once:

distance = [int(d) for d in distance]

or

distance = map(int, distance)

You should try a few solutions out and implement the one you feel gives you the best combination of working correctly and readability.

Two-Bit Alchemist
  • 17,966
  • 6
  • 47
  • 82
  • `line` as the name implies is a single line; you shouldn't call `.splitlines()` on it. If there is one integer per line then `int(line)` should work. Judging by the example list, there could be multiple integers in the line. If they are space-separated then `distances.extend(map(int, line.split()))` – jfs Mar 26 '14 at 20:59
  • Thank you - mindlessly copied and pasted that and did not fix. – Two-Bit Alchemist Mar 26 '14 at 21:04
  • That was a very illuminating response, thank you. I haven't got them to work yet, I'll try J.F.'s suggestion and see if that resolves the situation. – Nikki Mar 26 '14 at 21:26
  • Thank you for the feedback. One thing my answer does not mention is the generator approach in the other answer. This can be very important if you are using very large lists because while a list is constructed all at once in RAM (which is slow and intensive), the iterators and generators spit out one element at a time as needed. For short lists, this will be fine. – Two-Bit Alchemist Mar 26 '14 at 21:28
  • Implementing J.F.'s fix and we have integers! Good show. Thank you, gentlemen. Now I can work with that matrix, at last. – Nikki Mar 26 '14 at 21:37
1

My guess is you want to split on all whitespace, rather than newlines. If the file's not large, just read it all in:

distances = map(int, open('file').read().split())

If some of the values aren't numeric:

distances = (int(word) for word in open('file').read().split() if word.isdigit())

If the file is very large, use a generator to avoid reading it all at once:

import itertools
with open('file') as dists:
  distances = itertools.chain.from_iterable((int(word) for word in line.split()) for line in dists)
kojiro
  • 74,557
  • 19
  • 143
  • 201