read in .txt file , transform into pandas dataframe, but spaces seperating value vary in number of spaces

Question

This script reads in a txt file and creates a df, but the 'sep' argument I want to handle values that may be seperated by 1 space or more, so when I run the script above I get many columns with NaN.

code:

df = pd.read_csv(data_file,header = None, sep=' ')

example txt file

blah blahh    bl
blah3 blahhe      ble

I want there to just be 3 columns so i get

Col_a  col_b   col_c
blah   blahh    bl
blah3  blahhe   ble

Try `df = pd.read_csv(data_file,header = None, sep='\s+', names='Col_a Col_b Col_c'.split(' '))` using regex for one or more space characters. — Scott Boston, Dec 31 '21 at 02:09

Viettel Solutions · Accepted Answer · 2021-12-31T02:22:42.173

3

You can use regex as the delimiter:

pd.read_csv(data_file, header=None, delimiter=r"\s+", names='Col_a Col_b Col_c'.split(' '))

Or you can use delim_whitespace=True argument, it's faster than regex:

pd.read_csv(data_file, header=None, delim_whitespace=True, names='Col_a Col_b Col_c'.split(' '))

Reference: How to read file with space separated values in pandas

edited Dec 31 '21 at 02:22

answered Dec 31 '21 at 02:17

Viettel Solutions

1,519
11
22

read in .txt file , transform into pandas dataframe, but spaces seperating value vary in number of spaces

1 Answers1