1

I have some data in txt format with 38 columns which looks like this:

screenshot

With the exception of the header row, most of the rows have missing values. I want to convert this table into an array/ dataframe/ excel. But it is not coming as it looks in table.

I tried using python

df = pandas.read_csv(filename, sep='\s+',names=colnames, header=None)

I am confused about what seperator to use.

The program should look for value after single space. If no value is present, fill it with nan. How to do that?
Thanks in advance!

Manojk07
  • 337
  • 2
  • 10
  • 1
    Using `pd.read_csv(filename, sep='\s+')` works fine on my computer. Which version of pandas are you using? I'm using `0.24.2` – tdpr Feb 07 '20 at 15:02
  • 1
    @tdpr I am using pandas `0.25.1` . For me, the above command got executed without giving any error but R/F col values coming to AW, RA col values to R/F and so on as `sep='\s+'` is looking for one or more spaces. – Manojk07 Feb 10 '20 at 05:15

1 Answers1

2

You can use pandas.read_fwf (fixed-width format):

>>> df = pd.read_fwf('data.txt')
>>> df
    INDEX  YEAR  MN  DT   MAX   MIN  ...  T.2  G.2  DUR.2  T.3  G.3  DUR.3
0   14210  1972   9   1  32.0  22.0  ...  NaN  NaN    NaN  NaN  NaN    NaN
1   14210  1972   9   2  32.3  21.5  ...  NaN  NaN    NaN  NaN  NaN    NaN
2   14210  1972   9   3  32.8  22.4  ...  NaN  NaN    NaN  NaN  NaN    NaN
3   14210  1972   9   4  32.0  22.0  ...  NaN  NaN    NaN  NaN  NaN    NaN
4   14210  1972   9   5  33.2  23.6  ...  0.0  7.0  280.0  NaN  NaN    NaN
5   14210  1972   9   6  31.6  23.2  ...  5.0  8.0   45.0  0.0  8.0    NaN
6   14210  1972   9   7  31.5  21.0  ...  5.0  4.0   45.0  NaN  NaN    NaN
7   14210  1972   9   8  29.7  21.6  ...  NaN  NaN    NaN  NaN  NaN    NaN
8   14210  1972   9   9  29.7  21.1  ...  NaN  NaN    NaN  NaN  NaN    NaN
9   14210  1972   9  10  27.6  21.5  ...  NaN  NaN    NaN  NaN  NaN    NaN
10  14210  1972   9  11  30.3  21.3  ...  6.0  1.0   80.0  NaN  NaN    NaN
11  14210  1972   9  12  30.6  22.0  ...  5.0  5.0   30.0  NaN  NaN    NaN
12  14210  1972   9  13  30.2  21.4  ...  0.0  7.0  195.0  NaN  NaN    NaN
13  14210  1972   9  14  28.2  21.5  ...  NaN  NaN    NaN  NaN  NaN    NaN
14  14210  1972   9  15  30.3  21.9  ...  0.0  7.0  305.0  NaN  NaN    NaN
15  14210  1972   9  17  32.0  22.0  ...  6.0  7.0  135.0  NaN  NaN    NaN
16  14210  1972   9  18  32.0  20.5  ...  6.0  6.0   80.0  5.0  NaN    NaN

[17 rows x 38 columns]
Seb
  • 4,422
  • 14
  • 23