3

I need to edit input from a program that works with fixed-width columns. I managed to load it properly and worked out what I needed, but I don't know how to save the edited file to keep the same width of columns.

import pandas as pd

file = pd.read_fwf('file.inp', 
                  colspecs = [(0, 6), (6, 11), (11, 16), (16, 20), (20, 22), 
                              (22, 26), (26, 38), (38, 46), (46, 54), (54, 61),
                              (61, 68), (68, 90)])

The problem is that the number of spaces between the columns or the length of the strings of data vary, so I can't simply add fixed number of white spaces between the columns.

The file to edit looks like this:

ATOM    873  N   ALA A  59      41.629  23.754-163.394  1.00 12.93           N
ATOM   5089  NH1 ARG A 315      21.344 -13.371 187.612  1.00 66.09           N1+
ATOM   7839 H5''   A B   3      31.406  -4.882-165.817  1.00 16.98           H 
HETATM 7766 H161 G3A B   1      42.941   1.714-165.146  1.00 14.70           H  

It is a mixture of numbers, strings, special characters and columns glued together.

Anavae
  • 482
  • 1
  • 3
  • 13
  • @jezrael - I did that and the widths in the list of tuples are correct. The file is loaded into a data frame properly. I only have a problem with saving it again with the same widths of columns. – Anavae Jan 21 '20 at 12:40
  • Interesting, I found a `read_fwf` but not a `write_fwf`! – urban Jan 21 '20 at 12:52

1 Answers1

1

As I commented, I could not find a write_fdf method in pandas. However, I think you can achieve what you want with tabulate. I am basing the following code on this post and I have not run it:

import pandas as pd
from tabulate import tabulate


df = pd.read_fwf('file.inp', 
                  colspecs = [(0, 6), (6, 11), (11, 16), (16, 20), (20, 22), 
                              (22, 26), (26, 38), (38, 46), (46, 54), (54, 61),
                              (61, 68), (68, 90)])


with open("...", "w") as f:
    f.write(
        tabulate(
            [list(row) for row in df.values],
            tablefmt="plain"
        )
    )

Note again that I have not run this and is only to demo how to use tabulate:

  • I have not passed headers parameter which you can use if you need to
  • I used format "plain" to avoid any table-like decorations

More on tabulate here

UPDATE: I realize the output needs to align to specific columns so the following seems to get close enough (minor specs editing might be needed):

import pandas as pd
import sys


# Mock data
lst = [
    ["ATOM", 873, "N", "ALA", "A", 59, 41.629, 23.754, -163.394, 1.00, 12.93, "N"],
    ["ATOM", 5089, "NH1", "ARG", "A", 315, 21.344, -13.371, 187.612, 1.00 ,66.09, "N1+"],
    ["ATOM", 7839, "H5''", "A", "B", 3, 31.406, -4.882, -165.817, 1.00, 16.98, "H" ],
    ["HETATM", 7766, "H161", "G3A", "B", 1, 42.941, 1.714, -165.146, 1.00, 14.70, "H"],
]

# NOTE the spaces at the end, only when needed
colspecs = [
    "{: <6} ", # left, width=6
    "{: >4} ", # right, width=4
    "{: >4} ",
    "{: >3} ",
    "{: >1} ",
    "{: >3} ",
    "{: >11} ",
    "{: >7}",
    "{: >8} ",
    "{: >5} ",
    "{: <15} ",
    "{: <3}",
]


def write_fdf(fpath, pd, specs):
    """
    Write a Pandas dataframe in fixed width column format with the given
    column specs

    Args:
        fpath: File path
        ps: Dataframe
        specs: A list of python formats
    """
    with open(fpath, "w") as f:
        for _, row in df.iterrows():
            for idx, value in enumerate(row):
                sys.stdout.write(specs[idx].format(value))
                f.write(specs[idx].format(value))

            f.write("\n")
            print("")


df = pd.DataFrame(lst)
write_fdf("/tmp/out.dat", pd, colspecs)

Output:

$ python ~/tmp/test.py
ATOM    873    N ALA A  59      41.629  23.754-163.394   1.0 12.93           N
ATOM   5089  NH1 ARG A 315      21.344 -13.371 187.612   1.0 66.09           N1+
ATOM   7839 H5''   A B   3      31.406  -4.882-165.817   1.0 16.98           H
HETATM 7766 H161 G3A B   1      42.941   1.714-165.146   1.0 14.7            H


$ cat /tmp/out.dat
ATOM    873    N ALA A  59      41.629  23.754-163.394   1.0 12.93           N
ATOM   5089  NH1 ARG A 315      21.344 -13.371 187.612   1.0 66.09           N1+
ATOM   7839 H5''   A B   3      31.406  -4.882-165.817   1.0 16.98           H
HETATM 7766 H161 G3A B   1      42.941   1.714-165.146   1.0 14.7            H

However, I see one exceptions in the formatting which seem to break the rules: row=0, col=2: "N" seems to be centered in your example

urban
  • 5,392
  • 3
  • 19
  • 45
  • I feel it's almost there. But the problem is that I don't know now how to tell tabulate to make columns with specific widths. It would be ideal to somehow pass the colspec list of tuples and tell tabulate to put data between the specific columns... – Anavae Jan 21 '20 at 13:39
  • Hmm didn't think of that! I will look in docs but not sure if this is supported... – urban Jan 21 '20 at 14:02