0

I have a file with blank line as separator. My file looks like this

A B C

D  F

A K F
G H

123 AB 34
34 GE PQ 56

In the above format, line separator is an empty line. How can I read such a file using pandas ? First I thought, I will use usual read_csv function and then I can combine all the rows till an empty row into a single row. But it seems that is not very straightforward to do. As detecting an empty row and combining non-indexed rows seems impossible.

Any workaround to solve my issue ? I do not want to explicitly change the format of the file, as the files are feed from an external provider and handled in online fashion

Shew
  • 1,557
  • 1
  • 21
  • 36

1 Answers1

1

Use this solution with join lists and append to DataFrame contructor:

def per_section(it, is_delimiter=lambda x: x.isspace()):
    ret = []
    for line in it:
        if is_delimiter(line):
            if ret:
                yield ''.join(ret)
                ret = []
        else:
            ret.append(line.rstrip())
    if ret:
        yield ''.join(ret)

with open("data.txt") as f:
    s = list(per_section(f))
    df = pd.DataFrame({'data':s})
    print (df)
                   data
0                 A B C
1                  D  F
2              A K FG H
3  123 AB 3434 GE PQ 56
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252