0

I have a file with .dat.gz extension .I have tried to load all the data contained in the file into an an object .Now I want to load it into a dataframe to work on its calculations. I have tried some steps shown in the shared screenshot.Kindly suggest what I can try for it.I tried opening this file from the cmd also but it failed.enter image description here

  • Are you sure this is a comma-delimited file, not space-delimited or tab-delimited? You specified `sep=','`. Did you try `sep=' '` (space between quotes) or `sep='\t'`? Also, it does not look there is a header in the file, so `header=` should be `None`, not `1`, and you should add the `names=` parameter to provide column names. – AlexK May 07 '21 at 22:13
  • Thanks AlexK for the suggestion.I tried doing what you said.I am getting an error like this:ParserError: Error tokenizing data. C error: Expected 228 fields in line 9, saw 231......and also I don't know the names becoz this file is not opening. – Prachi singhal May 08 '21 at 14:02
  • Search Stack Overflow if you are running into new issues. Here is a [post](https://stackoverflow.com/questions/18039057/python-pandas-error-tokenizing-data) on the error you are getting. – AlexK May 08 '21 at 18:41
  • For reading compressed files: https://stackoverflow.com/questions/10566558/python-read-lines-from-compressed-text-files – AlexK May 08 '21 at 18:47

1 Answers1

0

The file is unable to be opened directly into notebook even after using read_fwf with column names.So just extracted the file into .dat format using the external software named peazip and then loaded it into dataframe by using the pd.read_fwf("file name") with names=[column names]