5

I need to read columns of complex numbers in the format:

# index; (real part, imaginary part); (real part, imaginary part) 

  1              (1.2, 0.16)                  (2.8, 1.1)
  2              (2.85, 6.9)                  (5.8, 2.2)

NumPy seems great for reading in columns of data with only a single delimiter, but the parenthesis seem to ruin any attempt at using numpy.loadtxt().

Is there a clever way to read in the file with Python, or is it best to just read the file, remove all of the parenthesis, then feed it to NumPy?

This will need to be done for thousands of files so I would like an automated way, but maybe NumPy is not capable of this.

piet.t
  • 11,718
  • 21
  • 43
  • 52
f4hy
  • 1,299
  • 3
  • 14
  • 18

3 Answers3

5

Here's a more direct way than @Jeff's answer, telling loadtxt to load it in straight to a complex array, using a helper function parse_pair that maps (1.2,0.16) to 1.20+0.16j:

>>> import re
>>> import numpy as np

>>> pair = re.compile(r'\(([^,\)]+),([^,\)]+)\)')
>>> def parse_pair(s):
...    return complex(*map(float, pair.match(s).groups()))

>>> s = '''1 (1.2,0.16) (2.8,1.1)
2 (2.85,6.9) (5.8,2.2)'''
>>> from cStringIO import StringIO
>>> f = StringIO(s)

>>> np.loadtxt(f, delimiter=' ', dtype=np.complex,
...            converters={1: parse_pair, 2: parse_pair})
array([[ 1.00+0.j  ,  1.20+0.16j,  2.80+1.1j ],
       [ 2.00+0.j  ,  2.85+6.9j ,  5.80+2.2j ]])

Or in pandas:

>>> import pandas as pd
>>> f.seek(0)
>>> pd.read_csv(f, delimiter=' ', index_col=0, names=['a', 'b'],
...             converters={1: parse_pair, 2: parse_pair})
             a           b
1  (1.2+0.16j)  (2.8+1.1j)
2  (2.85+6.9j)  (5.8+2.2j)
Danica
  • 28,423
  • 6
  • 90
  • 122
  • I had never heard about pandas. Seems pretty useful. Thanks! – f4hy May 21 '13 at 15:26
  • Running this exact example on my machine returns an error: "ValueError: complex() arg is a malformed string". Any explanation? – Joey Dumont Jan 21 '14 at 19:14
  • @JoeyDumont I just tried it again and it still worked with python 2.7.6 / numpy 1.7.1. What python / numpy versions are you using? – Danica Jan 23 '14 at 04:09
  • 1
    @JoeyDumont Actually, I just tried with python 3.3 and got a similar error. Tracking it down a bit, it seems to be that numpy is reading its inputs as `bytes`, and for some reason e.g. `complex(b'1')` fails with a `TypeError` whereas `float(b'1')` works just fine. This should be a problem for ever reading anything with `loadtxt(..., dtype=complex)`; you can work around it by adding a converter for everything else of e.g. `lambda x: complex(x.decode() if isinstance(x, bytes) else x)`. `parse_pair` will also need to get its input decoded to match it against a regex. Annoying.... – Danica Jan 23 '14 at 04:22
  • I use Python 2.7.6 / numpy 1.8.0. I'm not sure of how to change `parse_pair` to try your solution. – Joey Dumont Jan 23 '14 at 14:41
  • Ah, numpy 1.8 may be doing the same kind of bytes handling, maybe. I don't have an install handy to try it out, but in 2.7 all you should need to do is replace `pair.match(s)` with `pair.match(s.decode())`. That wouldn't give you the error you got, though; try adding `{0: lambda x: complex(x.decode())}` to the `converters` dictionary? – Danica Jan 23 '14 at 18:30
4

Since this issue is still not resolved in pandas, let me add another solution. You could modify your DataFrame with a one-liner after reading it in:

import pandas as pd

df = pd.read_csv('data.csv')
df = df.apply(lambda col: col.apply(lambda val: complex(val.strip('()'))))
jakosz
  • 41
  • 1
2

If your file only has 5 columns like you've shown, you could feed it to pandas with a regex for conversion, replacing the parentheses with commas on every line. After that, you could combine them as suggested in this SO answer to get complex numbers.

Pandas makes it easier, because you can pass a regex to its read_csv method, which lets you write clearer code and use a converter like this. The advantage over the numpy version is that you can pass a regex for the delimiter.

import pandas as pd
from StringIO import StringIO
f_str = "1 (2, 3) (5, 6)\n2 (3, 4) (4, 8)\n3 (0.2, 0.5) (0.6, 0.1)"
f.seek(0)

def complex_converter(txt):
    txt = txt.strip("()").replace(", ", "+").replace("+-", "-") + "j"
    return complex(txt)

df = pd.read_csv(buf, delimiter=r" \(|\) \(", converters = {1: complex_converter, 2: complex_converter}, index_col=0)

EDIT: Looks like @Dougal came up with this just before I posted this...really just depends on how you want to handle the complex number. I like being able to avoid the explicit use of the re module.

Community
  • 1
  • 1
Jeff Tratner
  • 16,270
  • 4
  • 47
  • 67
  • I had never heard about pandas. Seems pretty useful. Thanks! – f4hy May 21 '13 at 15:26
  • Just a note if anyone else looks at this for help, your complex_converter does not work if the imaginary part is negative. The complex method seems to not work for "1+-3j" – f4hy May 24 '13 at 16:05
  • @f4hy fixed it to replace `"+-"` with `"-"`. Works for negative imaginary numbers now :) – Jeff Tratner May 24 '13 at 18:51