pandas read csv - recognize entries in quotation marks while using space delimiter?

Question

I want to read in text files that look like this with pandas:

Flag Rootname "Target Name" RA DEC PropID "PI Name" Detector Segment LP Grating Cenwave FPPOS Exptime Nevents "Mean Flux" "Median Flux" Date "Target Description"
1 lcjw02hwq RXJ2043.1+0324 310.7761535644531 3.4143054485321045 13840 Fox FUV BOTH 2 G130M 1291 2 2935.199951171875 1472553.0 4.113247049008454e-15 3.6400732688485204e-15 2014-10-23 "ISM;ABSORPTION LINE SYSTEM;HIGH VELOCITY CLOUD"
1 lcjw02ikq RXJ2043.1+0324 310.7761535644531 3.4143054485321045 13840 Fox FUV BOTH 2 G130M 1291 4 1375.199951171875 769373.0 4.134839189383387e-15 3.562496062308341e-15 2014-10-23 "ISM;ABSORPTION LINE SYSTEM;HIGH VELOCITY CLOUD"

Each new term or value is separated by a space, but the issue is that some of these are in quotation marks and also have spaces inside. For example "Target Name" should be the name of one column, and "ISM;ABSORPTION LINE SYSTEM;HIGH VELOCITY CLOUD" is one value (although I do not need to store these with the quotation marks).

I tried the below code using quotechar as an input, but it does not work.

df = pd.read_csv(path, sep='\s', quotechar='"')

I also saw this related question, but the suggestion of adding skipinitialspace = True as an argument did not help either. When I call df.head(), I can still see that it is splitting "Target Name", etc into two separate column names. Is there a way to fix this?

score 1 · Accepted Answer · answered Oct 22 '20 at 19:07

1

Try to change the sep='\s' to sep=' ':

df = pd.read_csv('<your file>', sep=' ', quotechar='"')
print(df)

Prints:

   Flag   Rootname  ...        Date                              Target Description
0     1  lcjw02hwq  ...  2014-10-23  ISM;ABSORPTION LINE SYSTEM;HIGH VELOCITY CLOUD
1     1  lcjw02ikq  ...  2014-10-23  ISM;ABSORPTION LINE SYSTEM;HIGH VELOCITY CLOUD

[2 rows x 19 columns]

df.to_csv() produces then (screenshot from LibreOffice):

answered Oct 22 '20 at 19:07

Andrej Kesely

168,389
15
48
91

1

Thanks so much, this worked! Just curious - do you know why '\s' did not work here? – curious_cosmo Oct 22 '20 at 19:12

pandas read csv - recognize entries in quotation marks while using space delimiter?

1 Answers1