I want to read in text files that look like this with pandas:
Flag Rootname "Target Name" RA DEC PropID "PI Name" Detector Segment LP Grating Cenwave FPPOS Exptime Nevents "Mean Flux" "Median Flux" Date "Target Description"
1 lcjw02hwq RXJ2043.1+0324 310.7761535644531 3.4143054485321045 13840 Fox FUV BOTH 2 G130M 1291 2 2935.199951171875 1472553.0 4.113247049008454e-15 3.6400732688485204e-15 2014-10-23 "ISM;ABSORPTION LINE SYSTEM;HIGH VELOCITY CLOUD"
1 lcjw02ikq RXJ2043.1+0324 310.7761535644531 3.4143054485321045 13840 Fox FUV BOTH 2 G130M 1291 4 1375.199951171875 769373.0 4.134839189383387e-15 3.562496062308341e-15 2014-10-23 "ISM;ABSORPTION LINE SYSTEM;HIGH VELOCITY CLOUD"
Each new term or value is separated by a space, but the issue is that some of these are in quotation marks and also have spaces inside. For example "Target Name"
should be the name of one column, and "ISM;ABSORPTION LINE SYSTEM;HIGH VELOCITY CLOUD"
is one value (although I do not need to store these with the quotation marks).
I tried the below code using quotechar
as an input, but it does not work.
df = pd.read_csv(path, sep='\s', quotechar='"')
I also saw this related question, but the suggestion of adding skipinitialspace = True
as an argument did not help either. When I call df.head()
, I can still see that it is splitting "Target Name", etc into two separate column names. Is there a way to fix this?