1

How do I read one "cell" of a fixed width column that is split over two lines? The data input is a fixed width table, like so;

ID   Description                 QTY
1    Description split over      1
     two lines
2    Description on one line     2

I'd like to have the data frame format the data as per below;

ID   Description                           QTY
1    Description split over two lines      1       
2    Description on one line               2

My current code is;

import pandas as pd

df = pd.read_fwf('test.txt', names = ['ID', 'Description', 'QTY'])
df

But this gives me;

ID   Description                 QTY
1    Description split over      1
NaN  two lines                   NaN 
2    Description on one line     2

Any ideas?

  • 1
    Related: http://stackoverflow.com/questions/42240022/python-pandas-merge-two-or-more-lines-of-text-into-one-line, http://stackoverflow.com/questions/43761607/pandas-how-to-read-csv-with-multiple-lines-on-the-same-cell – Mel May 22 '17 at 08:07
  • Not sure if this is possible within Pandas, it might simply need some regex based script to turn your file into something more normal where nothing spills across lines like that. – cardamom May 22 '17 at 08:26

1 Answers1

0
#Conditionally concatenate description from next row to current row if the ID of next row is NAN>
df['Description'] = df.apply(lambda x: x.Description if x.name==(len(df)-1) else x.Description + ' ' + df.iloc[x.name+1]['Description'] if np.isnan(df.iloc[x.name+1]['ID']) else x.Description, axis=1)

#Drop rows with NA.
df = df.dropna()
Allen Qin
  • 19,507
  • 8
  • 51
  • 67