0

I have a very large file (~20 GB) from which I want to read specific lines which represent a matrix. The data file for three 2 x 2 matrices looks like:

2 3
1 3
2 2
1 2
3 2
3 4

Currently I am using the following approach (from here) where I get a list of strings.

import itertools
import matplotlib.pyplot as plt

n = 2 # matrix size
t = 3 # number of matrices
file = open("data")
t = 0;
with file as f:
    while t < 3:
        t=t+1
        next_n_lines = list(islice(f, n))
        print(next_n_lines)
        plt.matshow(next_n_lines)
        plt.show()
        if not next_n_lines:
            break
        # process next_n_lines

But how do I get floats instead of a list of strings? I don't see it, but it can't be so hard.

Gilfoyle
  • 3,282
  • 3
  • 47
  • 83

3 Answers3

2

Just .split the lines and map the float function onto the results, using list-comprehensions here, but whatever you want:

In [29]: from itertools import *
    ...: n = 2 # matrix size
    ...: t = 3 # number of matrices
    ...: with open('data') as f:
    ...:     for _ in range(t):
    ...:         s = islice(f, n)
    ...:         M = [[float(x) for x in line.split()] for line in s]
    ...:         print(M)
    ...:
[[2.0, 3.0], [1.0, 3.0]]
[[2.0, 2.0], [1.0, 2.0]]
[[3.0, 2.0], [3.0, 4.0]]

Also note, it is a lot cleaner to use a for-loop rather than a while-loop.

juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
1

Extended solution:

import matplotlib.pyplot as plt, itertools

n = 2
num_m = 3
with open('data', 'r') as f:
    for i in range(num_m):
        try:
            items = [list(map(float, i.split())) for i in itertools.islice(f, n)]
        except:
            raise
        else:
            plt.matshow(items)
            plt.show()

The output:

enter image description here enter image description here enter image description here

RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
1

NumPy's fromfilecan be useful here:

import numpy as np

n = 2 # matrix size
t = 3 # number of matrices

with open('data') as fobj:
    for _ in range(t):
        try:
            numbers = np.fromfile(fobj, count=n * n, sep=' ').reshape(n, n)
            plt.matshow(numbers)
            plt.show()
        except ValueError:
            break

Yields the desired output:

enter image description here enter image description here enter image description here

Mike Müller
  • 82,630
  • 20
  • 166
  • 161