loading the .dat.gz data opened in python into a dataframe

Question

I have a file with .dat.gz extension .I have tried to load all the data contained in the file into an an object .Now I want to load it into a dataframe to work on its calculations. I have tried some steps shown in the shared screenshot.Kindly suggest what I can try for it.I tried opening this file from the cmd also but it failed.

Are you sure this is a comma-delimited file, not space-delimited or tab-delimited? You specified `sep=','`. Did you try `sep=' '` (space between quotes) or `sep='\t'`? Also, it does not look there is a header in the file, so `header=` should be `None`, not `1`, and you should add the `names=` parameter to provide column names. — AlexK, May 07 '21 at 22:13
Thanks AlexK for the suggestion.I tried doing what you said.I am getting an error like this:ParserError: Error tokenizing data. C error: Expected 228 fields in line 9, saw 231......and also I don't know the names becoz this file is not opening. — Prachi singhal, May 08 '21 at 14:02
Search Stack Overflow if you are running into new issues. Here is a [post](https://stackoverflow.com/questions/18039057/python-pandas-error-tokenizing-data) on the error you are getting. — AlexK, May 08 '21 at 18:41
For reading compressed files: https://stackoverflow.com/questions/10566558/python-read-lines-from-compressed-text-files — AlexK, May 08 '21 at 18:47

score 0 · Answer 1 · answered May 10 '21 at 18:52

The file is unable to be opened directly into notebook even after using read_fwf with column names.So just extracted the file into .dat format using the external software named peazip and then loaded it into dataframe by using the pd.read_fwf("file name") with names=[column names]

loading the .dat.gz data opened in python into a dataframe

1 Answers1