1

I am trying to read one table from a larger .txt file into python.

An extract of the data is:

2 Network magnitudes:
    MLv       2.05 +/- 1.34   7            
    M         2.05            7 preferred  

7 Phase arrivals:
    sta  net   dist azi  phase   time         res     wt  sta
    BMOR  EC    0.0 226  P       00:22:31.385  -0.6 M  1.0  BMOR 
    BREF  EC    0.0 347  P       00:22:31.543  -0.5 M  1.0  BREF 
    BTAM  EC    0.0  58  P       00:22:31.796  -0.3 M  1.0  BTAM 
    BVC2  EC    0.0  26  P       00:22:33.061   0.8 M  1.0  BVC2 
    BNAS  EC    0.1 294  P       00:22:32.871  -0.1 M  1.0  BNAS 
    SUCR  EC    0.1 314  P       00:22:34.610   0.6 M  1.0  SUCR 
    BRRN  EC    0.1 207  P       00:22:34.768   0.4 M  1.0  BRRN 

7 Station magnitudes:
    sta  net   dist azi  type   value   res        amp per
    BMOR  EC    0.0 226  MLv     1.48 -0.57    1.20076    

I only want the phase arrivals table and so np.loadtext and np.genfromtxt both fall short for various reasons (can't deal with numbers and strings / contains a bug unless you specify only a one space (' ') delimiter, which I can't do here)

I've been trying with the pandas.read_csv fucntion but it isn't recognising the delimiters

a = pd.read_csv(datafileloc, sep='\+s', skiprows=5, skipfooter=3)

produces:

a
Out[90]: 
  sta  net   dist azi  phase   time         res     wt  sta
0  BMOR  EC    0.0 226  P       00:22:31.385  -0....       
1  BREF  EC    0.0 347  P       00:22:31.543  -0....       
2  BTAM  EC    0.0  58  P       00:22:31.796  -0....       
3  BVC2  EC    0.0  26  P       00:22:33.061   0....       
4  BNAS  EC    0.1 294  P       00:22:32.871  -0....       
5  SUCR  EC    0.1 314  P       00:22:34.610   0....       
6  BRRN  EC    0.1 207  P       00:22:34.768   0.... 

which looks good apart from that they're each one string and it hasn't paid attention to the white space delimiters:

a.values
Out[89]: 
array([['BMOR  EC    0.0 226  P       00:22:31.385  -0.6 M  1.0  BMOR'],
       ['BREF  EC    0.0 347  P       00:22:31.543  -0.5 M  1.0  BREF'],
       ['BTAM  EC    0.0  58  P       00:22:31.796  -0.3 M  1.0  BTAM'],
       ['BVC2  EC    0.0  26  P       00:22:33.061   0.8 M  1.0  BVC2'],
       ['BNAS  EC    0.1 294  P       00:22:32.871  -0.1 M  1.0  BNAS'],
       ['SUCR  EC    0.1 314  P       00:22:34.610   0.6 M  1.0  SUCR'],
       ['BRRN  EC    0.1 207  P       00:22:34.768   0.4 M  1.0  BRRN']], dtype=object)

Lines can be separated with list(a.values[0])[0].split() but this will then take reorganising to get individual columns. I would like to have pandas.read_csv just recognise they're separate so I can extract individual columns (being reasonably efficient is going to be important once I scale it up)

Where am I going wrong?

mjp
  • 1,618
  • 2
  • 22
  • 37
  • 1
    I'm not sure whether to close this as a typo (you need `\s+`, not `\+s`) or as a duplicate of [this](http://stackoverflow.com/questions/15026698/how-to-make-separator-in-read-csv-more-flexible-wrt-whitespace). – DSM May 13 '16 at 00:15
  • Hey @DSM - Thanks! I've just tested it and you're right, it is a typo. However I took it straight from the documentation, the typo is from there: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html (under delim_whitespace parameter) – mjp May 13 '16 at 00:19
  • you're right! I'll make sure that gets fixed. :-) – DSM May 13 '16 at 00:21

1 Answers1

2

As pointed out by DSM, it is a typo in the delimiter:

\s+, not \+s

which came from a typo in the documentation, under the delim_whitespace parameter heading.

Community
  • 1
  • 1
mjp
  • 1,618
  • 2
  • 22
  • 37
  • 1
    For the record, this has been fixed in trunk with [c9ffd78](https://github.com/pydata/pandas/commit/c9ffd7891dadd6e5590695e142f77a3476b5c4e3). – DSM May 13 '16 at 13:47