1

enter image description here

Hi, I have such .txt file, with the first column represent index, which is followed by three columns inside a pair of "()" representing x, y and z coordinates.

I want to load the first four columns of this file to pandas Dataframe. However, I found it's pretty hard as the delimiter is firstly " " and then "(" and inside the parenthesis there is ",".

Could someone give me some hint on how to deal with such situation?

Thank you! Shawn

Darth BEHFANS
  • 409
  • 6
  • 10
  • 4
    Please don't post data as pictures. We can not cut and paste a picture. – Stephen Rauch May 22 '17 at 01:14
  • Sorry, the file is shared at:https://www.dropbox.com/s/zy95y4z3lzws5c6/Initial_Coordinate.txt?dl=0 – Darth BEHFANS May 22 '17 at 01:25
  • I don't do dropbox, it is a security risk. Please data in the post. – Stephen Rauch May 22 '17 at 01:36
  • Check out [this answer](http://stackoverflow.com/questions/26551662/import-text-to-pandas-with-multiple-delimiters/26551913) for ways to read in a file with multiple delimiters and check out the `usecols` keyword argument in [the docs for the `read_csv` function](http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html) – bunji May 22 '17 at 01:37

2 Answers2

0

It is possible to write your own parser. Something like:

Code:

def parse_my_file(filename):
    with open(filename) as f:
        for line in f:
            yield [x.strip(',()')
                   for x in re.split(r'\s+', line.strip())[:4]]

Test Code:

df = pd.DataFrame(parse_my_file('file1'))
print(df)

Results:

    0       1       2  3
0  g1     -16       0  0
1  gr      10       0  0
2  D1  -6.858  2.7432  0
3  D2  -2.286  2.7432  0

This data file was created when I typed in your first four rows.

Stephen Rauch
  • 47,830
  • 31
  • 106
  • 135
  • Thank you! I tried your parser rule and it works great! tell k reminded me of not using complicated delimiters and even though I don't know why, but I think your solution directs me on the track! I need to look more deeper into your func. Thank you again! – Darth BEHFANS May 22 '17 at 02:05
  • @DarthBEHFANS , you are very welcome. However, on SO the very best way to say thanks is to upvote *any* questions or answers you find useful. And on your questions, if one of the answers is a good fit for your problem, you can mark it as the accepted answer. See the [Help Center](http://stackoverflow.com/help/someone-answers) for guidelines. – Stephen Rauch May 26 '17 at 00:03
0

You can use regex pattern as seperator of CSV.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html

like this.

import pandas as pd

df = pd.read_csv('Initial_Coordinate.txt', sep=r'[()]', header=None)
print(df)

However, rather than creating complex delimiters, it is better to fix it as a simple delimiter and then read it with pandas.

thx

tell k
  • 605
  • 2
  • 7
  • 18
  • Thanks! I tried the sep=r'[(,)]' and it works. May I ask what is the '[]' for in regex pattern? And your comment on "fix it as a simple delimiter", does it mean create some parser rule and substitute certain delimiters with uniform delimiter? – Darth BEHFANS May 22 '17 at 01:50
  • 1
    @DarthBEHFANS "[]" is metacharacter of regex. It means that it matches either of the two characters '(' and ')'. see also https://help.kcura.com/9.0/Content/Relativity/Regular_expressions/Regular_expression_metacharacters.htm – tell k May 22 '17 at 02:48
  • 1
    @DarthBEHFANS > does it mean create some parser rule and substitute certain delimiters with uniform delimiter? No, it is not. I mean that it is better to convert the original file (Initial_Coordinate.txt) to a simple delimiter. for example tab character. – tell k May 22 '17 at 02:51