0

I'm trying to convert this Breast Cancer Wisconsin data set from a list to a data frame with columns.

Here is the data set: http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data

These are the column names:

   #  Attribute                     Domain
   -- -----------------------------------------
   1. Sample code number            id number
   2. Clump Thickness               1 - 10
   3. Uniformity of Cell Size       1 - 10
   4. Uniformity of Cell Shape      1 - 10
   5. Marginal Adhesion             1 - 10
   6. Single Epithelial Cell Size   1 - 10
   7. Bare Nuclei                   1 - 10
   8. Bland Chromatin               1 - 10
   9. Normal Nucleoli               1 - 10
  10. Mitoses                       1 - 10
  11. Class:                        (2 for benign, 4 for malignant)

I imported the data set into python like this

import requests

link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
f = requests.get(link)

print (f.text)

and see the data as a list with commas:

1000025,5,1,1,1,2,1,3,1,1,2
1002945,5,4,4,5,7,10,3,2,1,2
1015425,3,1,1,1,2,2,3,1,1,2
1016277,6,8,8,1,3,4,3,7,1,2
1017023,4,1,1,3,2,1,3,1,1,2

I need to separate the commas into columns and add names to the columns

I tried this but it didn't work

import requests
import pandas as pd
import io

urlData = requests.get(f.text).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))

4 Answers4

0

this will do the trick

import requests
import os

csvFile = open('c:\\users\\user\\desktop\\data.csv','w')
headers = 'sample','Clump Thickness','niformity of Cell Size','Uniformity of Cell Shape','Marginal Adhesion','Single Epithelial Cell Size','Bare Nuclei','Bland Chromatin','Normal Nucleoli','Mitoses','Class'
r = requests.get("http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data").text
csvFile.write(str(headers).replace("'",'').replace('(','').replace(')','') + "\n")
csvFile.write(r)
csvFile.close()
0

The following worked for me:

import pandas as pd
import requests
link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
f = requests.get(link)
# separate each line
newf = f.text.splitlines()
# create pandas dataframe
df = pd.DataFrame([x.split(",") for x in newf])
Ricoleto
  • 17
  • 1
-1
import requests
import pandas as pd
import io

names = ['Sample code number',
         'Clump Thickness',
         'Uniformity of Cell Size',
         'Uniformity of Cell Shape',
         'Marginal Adhesion',
         'Single Epithelial Cell Size',
         'Bare Nuclei',
         'Bland Chromatin',
         'Normal Nucleoli',
         'Mitoses',
         'Class']

link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
csv_text = requests.get(link).text
# if you don't care about column names omit names=names and do headers=None instead
df = pd.read_csv(io.StringIO(csv_text), names=names)
Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119
-1

I would definitely think of a better way to do this but.... I have sent the output to a csv with a static header line.Since the data already is "," delimited, I thought this would be the easiest way.

import requests
import io

def main():
    outputFile = 'someName.csv'
    link = "http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/breast-cancer-wisconsin.data"
    f = requests.get(link)
    headerLine = ("Sample code number(id number),Clump Thickness(1 - 10),Uniformity of Cell Size(1 - 10),Uniformity of Cell Shape(1 - 10),Marginal Adhesion(1 - 10),Single Epithelial Cell Size(1 - 10),Bare Nuclei(1 - 10),Bland Chromatin(1 - 10),Normal Nucleoli(1 - 10),Mitoses(1 - 10),Class:(2 for benign - 4 for malignant)")
    data =(f.text)
    try:
        with open(outputFile, "w+") as ofile:
            ofile.write(headerLine + '\n')
            ofile.write(data)
            print("Success") 
    except Exception as e:
        raise e

if __name__ == '__main__':
    main()