1

This script reads in a txt file and creates a df, but the 'sep' argument I want to handle values that may be seperated by 1 space or more, so when I run the script above I get many columns with NaN.

code:

df = pd.read_csv(data_file,header = None, sep=' ')

example txt file

blah blahh    bl
blah3 blahhe      ble

I want there to just be 3 columns so i get

Col_a  col_b   col_c
blah   blahh    bl
blah3  blahhe   ble
0004
  • 1,156
  • 1
  • 14
  • 49
  • Try `df = pd.read_csv(data_file,header = None, sep='\s+', names='Col_a Col_b Col_c'.split(' '))` using regex for one or more space characters. – Scott Boston Dec 31 '21 at 02:09

1 Answers1

3

You can use regex as the delimiter:

pd.read_csv(data_file, header=None, delimiter=r"\s+", names='Col_a Col_b Col_c'.split(' '))

Or you can use delim_whitespace=True argument, it's faster than regex:

pd.read_csv(data_file, header=None, delim_whitespace=True, names='Col_a Col_b Col_c'.split(' '))

Reference: How to read file with space separated values in pandas

Viettel Solutions
  • 1,519
  • 11
  • 22