Pandas read "delimited" file

Question

Hi, I have such .txt file, with the first column represent index, which is followed by three columns inside a pair of "()" representing x, y and z coordinates.

I want to load the first four columns of this file to pandas Dataframe. However, I found it's pretty hard as the delimiter is firstly " " and then "(" and inside the parenthesis there is ",".

Could someone give me some hint on how to deal with such situation?

Thank you! Shawn

Please don't post data as pictures. We can not cut and paste a picture. — Stephen Rauch, May 22 '17 at 01:14
Sorry, the file is shared at:https://www.dropbox.com/s/zy95y4z3lzws5c6/Initial_Coordinate.txt?dl=0 — Darth BEHFANS, May 22 '17 at 01:25
I don't do dropbox, it is a security risk. Please data in the post. — Stephen Rauch, May 22 '17 at 01:36
Check out [this answer](http://stackoverflow.com/questions/26551662/import-text-to-pandas-with-multiple-delimiters/26551913) for ways to read in a file with multiple delimiters and check out the `usecols` keyword argument in [the docs for the `read_csv` function](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) — bunji, May 22 '17 at 01:37

score 0 · Answer 1 · answered May 22 '17 at 01:39

0

It is possible to write your own parser. Something like:

Code:

def parse_my_file(filename):
    with open(filename) as f:
        for line in f:
            yield [x.strip(',()')
                   for x in re.split(r'\s+', line.strip())[:4]]

Test Code:

df = pd.DataFrame(parse_my_file('file1'))
print(df)

Results:

    0       1       2  3
0  g1     -16       0  0
1  gr      10       0  0
2  D1  -6.858  2.7432  0
3  D2  -2.286  2.7432  0

This data file was created when I typed in your first four rows.

answered May 22 '17 at 01:39

Stephen Rauch

47,830
31
106
135

Thank you! I tried your parser rule and it works great! tell k reminded me of not using complicated delimiters and even though I don't know why, but I think your solution directs me on the track! I need to look more deeper into your func. Thank you again! – Darth BEHFANS May 22 '17 at 02:05
@DarthBEHFANS , you are very welcome. However, on SO the very best way to say thanks is to upvote *any* questions or answers you find useful. And on your questions, if one of the answers is a good fit for your problem, you can mark it as the accepted answer. See the [Help Center](http://stackoverflow.com/help/someone-answers) for guidelines. – Stephen Rauch May 26 '17 at 00:03

score 0 · Answer 2 · answered May 22 '17 at 01:39

0

You can use regex pattern as seperator of CSV.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

like this.

import pandas as pd

df = pd.read_csv('Initial_Coordinate.txt', sep=r'[()]', header=None)
print(df)

However, rather than creating complex delimiters, it is better to fix it as a simple delimiter and then read it with pandas.

thx

answered May 22 '17 at 01:39

tell k

605
2
7
18

Thanks! I tried the sep=r'[(,)]' and it works. May I ask what is the '[]' for in regex pattern? And your comment on "fix it as a simple delimiter", does it mean create some parser rule and substitute certain delimiters with uniform delimiter? – Darth BEHFANS May 22 '17 at 01:50
1

@DarthBEHFANS "[]" is metacharacter of regex. It means that it matches either of the two characters '(' and ')'. see also https://help.kcura.com/9.0/Content/Relativity/Regular_expressions/Regular_expression_metacharacters.htm – tell k May 22 '17 at 02:48
1

@DarthBEHFANS > does it mean create some parser rule and substitute certain delimiters with uniform delimiter? No, it is not. I mean that it is better to convert the original file (Initial_Coordinate.txt) to a simple delimiter. for example tab character. – tell k May 22 '17 at 02:51

Pandas read "delimited" file

2 Answers2

Linked