In python 3 I want to import web data with .data extension found in the link below. What's the code to import it?
https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data
In python 3 I want to import web data with .data extension found in the link below. What's the code to import it?
https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data
My suggestion is:
Knowing the data structure, there’s 9 columns, so we need only 8 splits. Start with reading data:
with open('auto-mpg.data') as fp:
data = [line.split(maxsplit=8) for line in fp]
Now you can handle each cell you’ve got.
This function must do the job:
import re
def handle(s):
if s == '?':
return None
elif re.match(r'^\d+\.\d+$', s):
return float(s)
elif re.match(r'^\d+$', s):
return int(s, 10)
else:
return s.strip().strip('"')
data = [tuple(map(handle, row)) for row in data]
Putting it all together:
import re
def handle(s):
if s == '?':
return None
elif re.match(r'^\d+\.\d+$', s):
return float(s)
elif re.match(r'^\d+$', s):
return int(s, 10)
else:
return s.strip().strip('"')
with open('auto-mpg.data') as fp:
data = [
tuple(map(handle, row))
for row in (line.split(maxsplit=8) for line in fp)
]
But it all was possible just because we know the data structure.
import pandas as pd
data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data')
This is what you get at the end (Auto-mpg Data on PyCharm Console):