2

A coworker left some data files I want to analyze with Numpy.

Each file is a matlab file, say data.m, and have the following formatting (but with a lot more columns and rows):

values = [-24.92 -23.66 -22.55 ;
-24.77 -23.56 -22.45 ;
-24.54 -23.64 -22.56 ;
];

which is the typical explicit matrix creation syntax used by matlab.

My question is: what would be the most practical way to create a numpy array from these files?

I could think about a "brute force" or a "quick and dirty" solution, but if there would be a more straightforward one, I would much rather use it, like a standard function from numpy or even from another module.

EDIT: I noticed that my files may contain NaN values, so I most probably will adapt the answers given to use numpy.genfromtxt instead of numpy.loadtxt. I plan to include my final code as soon as I have it.

Thanks for any help!

EDIT: I ended up with the following code, where I get everything between [] using regex, and create a numpy array using genfromtxt in order to handle NaN. A shorter solution could be to use fromstring method, which does not need StringIO, but this cannot handle NaN, and my data have NaN :oP

#!/usr/bin/env python
# coding: utf-8

import numpy, re, StringIO

with open('data.m') as f:
    s = re.search('\[(.*)\]', f.read(), re.DOTALL).group(1)
    buf = StringIO.StringIO(s)
    a = numpy.genfromtxt(buf, missing_values='NaN', filling_values=numpy.nan)
heltonbiker
  • 26,657
  • 28
  • 137
  • 252

2 Answers2

2

Here are a couple options, although neither is built in.

The solution you probably do not find acceptable

This solution probably falls into your "quick and dirty" category, but it helps lead in to the next solution.

Remove the values = [, the last line (];), and globally replace all ; with nothing to get:

-24.92 -23.66 -22.55 
-24.77 -23.56 -22.45 
-24.54 -23.64 -22.56 

Then you can use numpy's loadtxt as follows.

>>> import numpy as np
>>> A = np.loadtxt('data.m')

>>> A
array([[-24.92, -23.66, -22.55],
       [-24.77, -23.56, -22.45],
       [-24.54, -23.64, -22.56]])

A solution you might find acceptable

In this solution, we create a method to coerce the input data into a form that numpy loadtxt likes (the same form as above, actually).

import StringIO
import numpy as np

def convert_m(fname):
    with open(fname, 'r') as fin:
        arrstr = fin.read()
    arrstr = arrstr.split('[', 1)[-1] # remove the content up to the first '['
    arrstr = arrstr.rsplit(']', 1)[0] # remove the content after ']'
    arrstr = arrstr.replace(';', '\n') # replace ';' with newline
    return StringIO.StringIO(arrstr)

Now that we have that, do the following.

>>> np.loadtxt(convert_m('data.m'))
array([[-24.92, -23.66, -22.55],
       [-24.77, -23.56, -22.45],
       [-24.54, -23.64, -22.56]])
David Alber
  • 17,624
  • 6
  • 65
  • 71
  • Your answer was more or less the kind of thing I was considering. Today I am already tired, but tomorrow I will take a look to find out what suits me best. Besides, since my question prompts to generic methods, I will think about a good generic method, but probably `loadtxt` should be used anyway in these cases. Thanks, and accepted for now! – heltonbiker Oct 28 '11 at 01:31
1

You could feed an iterator to np.genfromtxt:

import numpy as np
import re

with open(filename, 'r') as f:
    lines = (re.sub(r'[^-+.0-9 ]+', '', line) for line in f)
    arr = np.genfromtxt(lines)

print(arr)

yields

[[-24.92 -23.66 -22.55]
 [-24.77 -23.56 -22.45]
 [-24.54 -23.64 -22.56]]

Thanks to Bitwise for clueing me in to this answer.

Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • Actually, the file you mention is a .mat file which contains matlab variables that can be loaded inside a matlab script. The file I have is (unfortunately) a .m file, which contains matlab source code (i.e., it is the script). If I were using matlab instead of numpy, I should "import" the .m file inside the running script, so that its code would be executed creating the matrix named `values` in the global namespace, but I am using Numpy, so... :o( – heltonbiker Oct 27 '11 at 22:46
  • That is a very professional answer. I will need a time to grasp its power, but for sure it gave me some deeper insights. Thank you very much! – heltonbiker Oct 28 '11 at 02:10