0

In python 3 I want to import web data with .data extension found in the link below. What's the code to import it?

https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data

MLhacker
  • 1,382
  • 4
  • 20
  • 35
  • Possible duplicate of [Parsing CSV / tab-delimited txt file with Python](http://stackoverflow.com/questions/7856296/parsing-csv-tab-delimited-txt-file-with-python) – Hexaholic Feb 11 '16 at 15:25
  • It all depends on what you want to do with it next. "machine learning database" suggests you plan to use a machine learning toolkit... so use its file reading services. Same goes for something like numerical analysis in `pandas`. The most generic parser is `csv` which just gives you rows as lists. – tdelaney Feb 11 '16 at 15:29

2 Answers2

0

My suggestion is:

Knowing the data structure, there’s 9 columns, so we need only 8 splits. Start with reading data:

with open('auto-mpg.data') as fp:
    data = [line.split(maxsplit=8) for line in fp]

Now you can handle each cell you’ve got.

This function must do the job:

import re

def handle(s):
    if s == '?':
        return None
    elif re.match(r'^\d+\.\d+$', s):
        return float(s)
    elif re.match(r'^\d+$', s):
        return int(s, 10)
    else:
        return s.strip().strip('"')

data = [tuple(map(handle, row)) for row in data]

Putting it all together:

import re

def handle(s):
    if s == '?':
        return None
    elif re.match(r'^\d+\.\d+$', s):
        return float(s)
    elif re.match(r'^\d+$', s):
        return int(s, 10)
    else:
        return s.strip().strip('"')

with open('auto-mpg.data') as fp:
    data = [
        tuple(map(handle, row))
        for row in (line.split(maxsplit=8) for line in fp)
    ]

But it all was possible just because we know the data structure.

0
import pandas as pd

data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data')

This is what you get at the end (Auto-mpg Data on PyCharm Console):

Auto-mpg Data on PyCharm Console

Thomas Fritsch
  • 9,639
  • 33
  • 37
  • 49
M. Yasin
  • 66
  • 6