1

I am trying to access this file from URL:

https://data.princeton.edu/wws509/datasets/copen.dat

However, I am unable to access it and split it for training and testing purpose.

Does someone have a solution for this?

Thanks

I have run the following code which converted the data into html. Now how can I access the data eg. if a want to access certain rows and columns, how would I do that?

import urllib.request
weburl=urllib.request.urlopen('https://data.princeton.edu/wws509/datasets/cuse.dat')

print('result code:'+ str(weburl.getcode()))
data=weburl.read()
print(data)
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Khubaib123
  • 13
  • 4
  • 1
    Telling us "I was unable to access it" is useless. Show us the code you tried. – John Gordon Apr 24 '19 at 16:30
  • 1
    This is complex task with multiple problems. Can you point out where you have the problems? Accessing the file via url, reading the file or splitting the columns? Can you show us what you have done? As this looks a bit like homework, are there any special constraints, like a limit number of packages that are allowed? – oekopez Apr 24 '19 at 16:33
  • Pls show us what you have tried so far. Coding is fun if you actually do it. Just do it! – jose_bacoy Apr 24 '19 at 16:35
  • import urllib.request weburl=urllib.request.urlopen('https://data.princeton.edu/wws509/datasets/cuse.dat') print('result code:'+ str(weburl.getcode())) data=weburl.read() print(data) – Khubaib123 Apr 24 '19 at 16:42
  • This is what i have done so far – Khubaib123 Apr 24 '19 at 16:42
  • I have edited the answer could you please check that once> – Pallamolla Sai Apr 24 '19 at 17:23

1 Answers1

1

To do this you need to install requests module in python.requests module

As @nekomatic suggests you can convert data to proper format by going through this link Getting list of lists into pandas DataFrame

import requests

response = requests.get('https://data.princeton.edu/wws509/datasets/copen.dat')
data = response.text // you can use response.json() method in this line

print("data is ")
print(data)

// the url we mentioned given data in text/plain format so response.json() doesn't work

data_by_line = data.split('\n')
for i in range(0,len(data_by_line)):
   data_by_line[i] = ' '.join(data_by_line[i].split())
   data_by_line[i] = data_by_line[i].split(' ')

print(data_by_line[2][2]) // output will be "low". We have converted data to multidimensional list(data_by_line)
Pallamolla Sai
  • 2,337
  • 1
  • 13
  • 14
  • I have done this but how can i access a certain column in the data now? I tried to access age column and this the error i get. AttributeError: 'str' object has no attribute 'iloc' – Khubaib123 Apr 24 '19 at 16:46
  • 1
    `iloc` is used to access data from a Pandas DataFrame. This answer returns the data as a list of lists. [This question](https://stackoverflow.com/questions/19112398/getting-list-of-lists-into-pandas-dataframe) should help you convert one to the other, or as one answer suggests you might be able to create a DataFrame directly from the source file. – nekomatic Apr 25 '19 at 12:09