1

I want to read in text files that look like this with pandas:

Flag Rootname "Target Name" RA DEC PropID "PI Name" Detector Segment LP Grating Cenwave FPPOS Exptime Nevents "Mean Flux" "Median Flux" Date "Target Description"
1 lcjw02hwq RXJ2043.1+0324 310.7761535644531 3.4143054485321045 13840 Fox FUV BOTH 2 G130M 1291 2 2935.199951171875 1472553.0 4.113247049008454e-15 3.6400732688485204e-15 2014-10-23 "ISM;ABSORPTION LINE SYSTEM;HIGH VELOCITY CLOUD"
1 lcjw02ikq RXJ2043.1+0324 310.7761535644531 3.4143054485321045 13840 Fox FUV BOTH 2 G130M 1291 4 1375.199951171875 769373.0 4.134839189383387e-15 3.562496062308341e-15 2014-10-23 "ISM;ABSORPTION LINE SYSTEM;HIGH VELOCITY CLOUD"

Each new term or value is separated by a space, but the issue is that some of these are in quotation marks and also have spaces inside. For example "Target Name" should be the name of one column, and "ISM;ABSORPTION LINE SYSTEM;HIGH VELOCITY CLOUD" is one value (although I do not need to store these with the quotation marks).

I tried the below code using quotechar as an input, but it does not work.

df = pd.read_csv(path, sep='\s', quotechar='"')

I also saw this related question, but the suggestion of adding skipinitialspace = True as an argument did not help either. When I call df.head(), I can still see that it is splitting "Target Name", etc into two separate column names. Is there a way to fix this?

curious_cosmo
  • 1,184
  • 1
  • 18
  • 36

1 Answers1

1

Try to change the sep='\s' to sep=' ':

df = pd.read_csv('<your file>', sep=' ', quotechar='"')
print(df)

Prints:

   Flag   Rootname  ...        Date                              Target Description
0     1  lcjw02hwq  ...  2014-10-23  ISM;ABSORPTION LINE SYSTEM;HIGH VELOCITY CLOUD
1     1  lcjw02ikq  ...  2014-10-23  ISM;ABSORPTION LINE SYSTEM;HIGH VELOCITY CLOUD

[2 rows x 19 columns]

df.to_csv() produces then (screenshot from LibreOffice):

enter image description here

Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91