python alternative to scan('file', what=list(...)) in R

Question

I have a file in following format:

I want to create a dataframe from this file (skipping the first 5 lines) like this:

x1   x2    y1  y2
0.00 0.00  0   1
0.00 0.01  0   1

So the lines are converted to columns (where each third line is also split into two columns, y1 and y2).

In R I did this as follows:

df = as.data.frame(scan(".../test.txt", what=list(x1=0, x2=0, y1=0, y2=0), skip=5))

I am looking for a python alternative (pandas?) to this scan(file, what=list(...)) function. Does it exist or do I have to write a more extended script?

Jon Clements · Accepted Answer · 2013-12-03T13:24:50.100

3

You can skip the first 5, and then take groups of 4 to build a Python list, then put that in pandas as a start... I wouldn't be surprised if pandas offered something better though:

from itertools import islice, izip_longest

with open('input') as fin:
    # Skip header(s) at start
    after5 = islice(fin, 5, None)
    # Take remaining data and group it into groups of 4 lines each... The
    # first 2 are float data, the 3rd is two integers together, and the 4th
    # is the blank line between groups... We use izip_longest to ensure we
    # always have 4 items (padded with None if needs be)...
    for lines in izip_longest(*[iter(after5)] * 4):
            # Convert first two lines to float, and take 3rd line, split it and
            # convert to integers
        print map(float, lines[:2]) + map(int, lines[2].split())

#[0.0, 0.0, 0, 1]
#[0.0, 0.01, 0, 1]

edited Dec 03 '13 at 13:24

answered Dec 03 '13 at 13:12

Jon Clements

138,671
33
247
280

Tnx Jon! If pandas (or something else) has a more concise function like scan() in R, would be awesome. – 2xu Dec 03 '13 at 13:17
1

+1 nice one @JonClements, could you explain it a bit? – Roman Pekar Dec 03 '13 at 13:22
@2xu I don't think it does... but there's people out there with way more pandas experience than I... For non trivial pre-processing - you generally end up writing a custom function that yields valid rows for use in a `DataFrame` anyway... – Jon Clements Dec 03 '13 at 13:22
@RomanPekar added a bit - hope it helps - if not, let me know – Jon Clements Dec 03 '13 at 13:26
@JonClements and why do you need `iter` around `after5`? – Roman Pekar Dec 03 '13 at 13:31
@RomanPekar probably best to point to [this question and answers](http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks-in-python) for that :) – Jon Clements Dec 03 '13 at 13:59

Giupo · Answer 2 · 2013-12-03T19:03:40.743

0

As far as I know I cannot see any options here http://pandas.pydata.org/pandas-docs/stable/io.html to organize your DataFrame as you want;

But you can achieve it easly:

lines = open('YourDataFile.txt').read() # read the whole file
import re                               # import re
elems = re.split('\n| ', lines)[5:]     # split each element and exclude the first 5 
grouped = zip(*[iter(elems)]*4)          # group them 4 by 4
import pandas as pd                     # import pandas
df = pd.DataFrame(grouped)              # construct DataFrame
df.columns = ['x1', 'x2', 'y1', 'y2']   # columns names

It's not concise, it's not elegant, but it's clear what it does...

edited Dec 03 '13 at 19:03

answered Dec 03 '13 at 13:27

Giupo

413
2
9

Nice one. Had to look up the *iter(elems)*4 part, but found it. And I'm not looking for elegancy, just brute force :-) – 2xu Dec 03 '13 at 15:39
And there was a typo too (elem instead of elems). Glad you understood it ;) – Giupo Dec 03 '13 at 19:04

score 0 · Answer 3 · answered Dec 05 '13 at 07:51

OK, here's how I did it (it is in fact a combo of Jon's & Giupo's answer, tnx guys!):

with open('myfile.txt') as file:
    data = file.read().split()[5:]
grouped = zip(*[iter(data)]*4)
import pandas as pd
df = pd.DataFrame(grouped)
df.columns = ['x1', 'x2', 'y1', 'y2']

python alternative to scan('file', what=list(...)) in R

3 Answers3