0

I've been looking through the tensorflow tutorials online (specifically the housing prices tutorial: https://colab.research.google.com/github/tensorflow/models/blob/master/samples/core/tutorials/keras/basic_regression.ipynb )

I've been trying to upload my own csv file for a similar project using google colab. But I can't seem to get the format right - I'm very new to this, so I can't find a solution which I can understand.

from __future__ import absolute_import, division, print_function

import tensorflow as tf
from tensorflow import keras

import numpy as np
import pandas as pd

print(tf.__version__)

#Import the csv files

from google.colab import files
uploaded = files.upload()

for fn in uploaded.keys():
  print('User uploaded file "{name}" with length {length} bytes'.format(
  name = fn, length = len(uploaded[fn])))

# This is where I upload my csv file

import io

df = pd.read_csv(io.StringIO(uploaded[ 'data.csv'].decode('utf-8')))
df.head()

(train_data, train_labels), (test_data, test_labels) = uploaded.load_data()

# Shuffle the training set
order = np.argsort(np.random.random(train_labels.shape))
train_data = train_data[order]
train_labels = train_labels[order]
print(boston_housing)

This is where the problem is - I can't seem to separate my data into my training and test data.

My data.csv is just 5 columns. Col 1-2 contain two sets of inputs, col3 contains the label, col 3-4 contain test input data.

Again, massive newbie, any help would be amazing! I'm so confused

1 Answers1

0

I'm guessing this line is the problem:

(train_data, train_labels), (test_data, test_labels) = uploaded.load_data()

uploaded is the result of the files.upload command, and that doesn't include a load_data method. Instead, it puts a copy of the file on the local filesystem and returns a dict containing the bytes of each uploaded file indexed by filename keys. For example:

enter image description here

You've already got data as a DataFrame in df. So, to split into test and training, perhaps do something like the recipes suggested here: How do I create test and train samples from one dataframe with pandas?

Bob Smith
  • 36,107
  • 11
  • 98
  • 91