9

I am working on side stuff where the data provided is in a .data file. How do I open a .data file to see what the data looks like and also how do I read from a .data file programmatically through python? I have Mac OSX

NOTE: The Data I am working with is for one of the KDD cup challenges

Jason Donnald
  • 2,256
  • 9
  • 36
  • 49

5 Answers5

3

Kindly try using Notepad or Gedit to check delimiters in the file (.data files are text files too). After you have confirmed this, then you can use the read_csv method in the Pandas library in python.

import pandas as pd
file_path = "~/AI/datasets/wine/wine.data"
# above .data file is comma delimited
wine_data = pd.read_csv(file_path, delimiter=",")
aalbagarcia
  • 1,019
  • 7
  • 20
mustious
  • 51
  • 4
1

It vastly depends on what is in it. It could be a binary file or it could be a text file.

If it is a text file then you can open it in the same way you open any file (f=open(filename,"r"))

If it is a binary file you can just add a "b" to the open command (open(filename,"rb")). There is an example here:

Reading binary file in Python and looping over each byte

Depending on the type of data in there, you might want to try passing it through a csv reader (csv python module) or an xml parsing library (an example of which is lxml)

After further into from above and looking at the page the format is:

Data Format The datasets use a format similar as that of the text export format from relational databases:

One header lines with the variables names One line per instance Separator tabulation between the values There are missing values (consecutive tabulations)

Therefore see this answer:

parsing a tab-separated file in Python

I would advise trying to process one line at a time rather than loading the whole file, but if you have the ram why not...

I suspect it doesnt open in sublime because the file is huge, but that is just a guess.

Community
  • 1
  • 1
user2539336
  • 180
  • 6
  • I tried to do `f=open("dataset.data","r")` and got `TypeError: descriptor 'read' of 'file' object needs an argument` error – Jason Donnald Aug 03 '15 at 22:01
0

To get a quick overview of what the file may content you could do this within a terminal, using strings or cat, for example:

$ strings file.data

or

$ cat -v file.data

In case you forget to pass the -v option to cat and if is a binary file you could mess your terminal and therefore need to reset it:

$ reset
nbari
  • 25,603
  • 10
  • 76
  • 131
0

I was just dealing with this issue myself so I thought I would share my answer. I have a .data file and was unable to open it by simply right clicking it. MACOS recommended I open it using Xcode so I tried it but it did not work.

Next I tried open it using a program named "Brackets". It is a text editing program primarily used for HTML and CSS. Brackets did work.

I also tried PyCharm as I am a Python Programmer. Pycharm worked as well and I was also able to read from the file using the following lines of code:

inf = open("processed-1.cleveland.data", "r")

lines = inf.readlines()

for line in lines:
    print(line, end="")
Wizard
  • 462
  • 1
  • 6
  • 14
0

It works for me.

import pandas as pd
# define your file path here
your_data = pd.read_csv(file_path, sep=',')
your_data.head()

I mean that just take it as a csv file if it is seprated with ','. solution from @mustious.