I have written the following code:
df = pd.read_csv('breast-cancer-wisconsin.data.csv', nrows= 300)
columns_df = pd.DataFrame(columns = ['column_name', 'value'])
columns = df.columns.values
for index, row in df.iterrows():
for column in columns:
columns_df = columns_df.append({'column_name': column, 'value': row[column]}, ignore_index=True)
This script reads a csv file to a pandas dataframe and then appends each column name and it's corresponding value to a new dataframe. If I run
print(columns_df[-10:])
I get the following output
column_name value
2300 clump_thickness 5
2301 unif_cell_size 5
2302 unif_cell_shape 7
2303 marg_adhesion 8
2304 single_epith_cell_size 6
2305 bare_nuclei 10
2306 bland_chrom 7
2307 norm_nucleoli 4
2308 mitoses 1
2309 class 4
2310 clump_thickness 5
2311 unif_cell_size 3
2312 unif_cell_shape 4
2313 marg_adhesion 3
2314 single_epith_cell_size 4
2315 bare_nuclei 5
2316 bland_chrom 4
2317 norm_nucleoli 7
2318 mitoses 1
2319 class 2
2320 clump_thickness 5
2321 unif_cell_size 4
2322 unif_cell_shape 3
2323 marg_adhesion 1
2324 single_epith_cell_size 2
2325 bare_nuclei -99999
2326 bland_chrom 2
2327 norm_nucleoli 3
2328 mitoses 1
2329 class 2
2330 clump_thickness 8
2331 unif_cell_size 2
2332 unif_cell_shape 1
2333 marg_adhesion 1
2334 single_epith_cell_size 5
2335 bare_nuclei 1
2336 bland_chrom 1
2337 norm_nucleoli 1
2338 mitoses 1
2339 class 2
2340 clump_thickness 9
2341 unif_cell_size 1
2342 unif_cell_shape 2
2343 marg_adhesion 6
2344 single_epith_cell_size 4
2345 bare_nuclei 10
2346 bland_chrom 7
2347 norm_nucleoli 7
2348 mitoses 2
2349 class 4
Unfortunately, this script is not very fast and will take a very long time to execute with large dataframes.
Question: Is there a more elegant/faster way to achieve these results?
A sample from my input data (the csv file):
id,clump_thickness,unif_cell_size,unif_cell_shape,
marg_adhesion,single_epith_cell_size,
bare_nuclei,bland_chrom,norm_nucleoli,mitoses,class
1000025,5,1,1,1,2,1,3,1,1,2
1002945,5,4,4,5,7,10,3,2,1,2
1015425,3,1,1,1,2,2,3,1,1,2
1016277,6,8,8,1,3,4,3,7,1,2
1017023,4,1,1,3,2,1,3,1,1,2
1017122,8,10,10,8,7,10,9,7,1,4
1018099,1,1,1,1,2,10,3,1,1,2
1018561,2,1,2,1,2,1,3,1,1,2
1033078,2,1,1,1,2,1,1,1,5,2
1033078,4,2,1,1,2,1,2,1,1,2
1035283,1,1,1,1,1,1,3,1,1,2
1036172,2,1,1,1,2,1,2,1,1,2
1041801,5,3,3,3,2,3,4,4,1,4
1043999,1,1,1,1,2,3,3,1,1,2
1044572,8,7,5,10,7,9,5,5,4,4
1047630,7,4,6,4,6,1,4,3,1,4
1048672,4,1,1,1,2,1,2,1,1,2
1049815,4,1,1,1,2,1,3,1,1,2
1050670,10,7,7,6,4,10,4,1,2,4
1050718,6,1,1,1,2,1,3,1,1,2
1054590,7,3,2,10,5,10,5,4,4,4
1054593,10,5,5,3,6,7,7,10,1,4
1056784,3,1,1,1,2,1,2,1,1,2
1057013,8,4,5,1,2,?,7,3,1,4
1059552,1,1,1,1,2,1,3,1,1,2
1065726,5,2,3,4,2,7,3,6,1,4
1066373,3,2,1,1,1,1,2,1,1,2
1066979,5,1,1,1,2,1,2,1,1,2
1067444,2,1,1,1,2,1,2,1,1,2
1070935,1,1,3,1,2,1,1,1,1,2
1070935,3,1,1,1,1,1,2,1,1,2
1071760,2,1,1,1,2,1,3,1,1,2
1072179,10,7,7,3,8,5,7,4,3,4
1074610,2,1,1,2,2,1,3,1,1,2
1075123,3,1,2,1,2,1,2,1,1,2