Questions tagged [training-data]

A training set is a set of data used to discover potentially predictive relationships, used in fields like artificial intelligence, machine learning, and statistics.

A training set is a set of data used to discover potentially predictive relationships, used in fields like artificial intelligence, machine learning, and statistics.

More info

1782 questions
161
votes
6 answers

Parameter "stratify" from method "train_test_split" (scikit Learn)

I am trying to use train_test_split from package scikit Learn, but I am having trouble with parameter stratify. Hereafter is the code: from sklearn import cross_validation, datasets X = iris.data[:,:2] y =…
Daneel Olivaw
  • 2,077
  • 4
  • 15
  • 23
100
votes
4 answers

What is validation data used for in a Keras Sequential model?

My question is simple, what is the validation data passed to model.fit in a Sequential model used for? And, does it affect how the model is trained (normally a validation set is used, for example, to choose hyper-parameters in a model, but I think…
danidc
  • 1,309
  • 3
  • 11
  • 11
82
votes
17 answers

How to demonstrate to management that mediocre developers are hurting team

I am in the precarious position of "managing" a team of developers at a small company. I say "managing" because although I assign work and provide feedback on their performance I have no recourse in actually disciplining an individual. Some of my…
deleteme
77
votes
4 answers

Normalize data before or after split of training and testing data?

I want to separate my data into train and test set, should I apply normalization over data before or after the split? Does it make any difference while building predictive model?
57
votes
6 answers

Training data for sentiment analysis

Where can I get a corpus of documents that have already been classified as positive/negative for sentiment in the corporate domain? I want a large corpus of documents that provide reviews for companies, like reviews of companies provided by analysts…
London guy
  • 27,522
  • 44
  • 121
  • 179
41
votes
7 answers

Data sets for neural network training

I am looking for some relatively simple data sets for testing and comparing different training methods for artificial neural networks. I would like data that won't take too much pre-processing to turn it into my input format of a list of inputs and…
Jeff Thomas
  • 847
  • 2
  • 7
  • 9
41
votes
6 answers

Publicly Available Spam Filter Training Set

I'm new to machine learning, and for my first project I'd like to write a naive Bayes spam filter. I was wondering if there are any publicly available training sets of labeled spam/not spam emails, preferably in plain text and not a dump of a…
JeremyKun
  • 2,987
  • 2
  • 24
  • 44
34
votes
4 answers

How to train a model in nodejs (tensorflow.js)?

I want to make a image classifier, but I don't know python. Tensorflow.js works with javascript, which I am familiar with. Can models be trained with it and what would be the steps to do so? Frankly I have no clue where to start. The only thing I…
Alex
  • 66,732
  • 177
  • 439
  • 641
34
votes
4 answers

Altering trained images to train neural network

I am currently trying to make a program to differentiate rotten oranges and edible oranges solely based on their external appearance. To do this, I am planning on using a Convolutional Neural Network to train with rotten oranges and normal oranges.…
28
votes
7 answers

R: How to split a data frame into training, validation, and test sets?

I'm using R to do machine learning. Following standard machine learning methodology, I would like to randomly split my data into training, validation, and test data sets. How do I do that in R? I know there are some related questions on how to split…
stackoverflowuser2010
  • 38,621
  • 48
  • 169
  • 217
27
votes
2 answers

what does class_mode parameter in Keras image_gen.flow_from_directory() signify?

train_image_gen = image_gen.flow_from_directory('/Users/harshpanwar/Desktop/Folder/train', target_size=image_shape[:2], batch_size=batch_size, …
26
votes
1 answer

How do I train tesseract 4 with image data instead of a font file?

I'm trying to train Tesseract 4 with images instead of fonts. In the docs they are explaining only the approach with fonts, not with images. I know how it works, when I use a prior version of Tesseract but I didn't get how to use the box/tiff…
claim
  • 506
  • 6
  • 13
26
votes
5 answers

Split tensor into training and test sets

Let's say I've read in a textfile using a TextLineReader. Is there some way to split this into train and test sets in Tensorflow? Something like: def read_my_file_format(filename_queue): reader = tf.TextLineReader() key, record_string =…
Luke
  • 6,699
  • 13
  • 50
  • 88
22
votes
3 answers

SVM classifier based on HOG features for "object detection" in OpenCV

I have a project, which I want to detect objects in the images; my aim is to use HOG features. By using OpenCV SVM implementation , I could find the code for detecting people, and I read some papers about tuning the parameters in order to detect…
Mario
  • 1,469
  • 7
  • 29
  • 46
16
votes
1 answer

How to Fine tune existing Tensorflow Object Detection model to recognize additional classes?

Thanks to Google for providing a few pre-trained models with tensorflow API. I would like to know how to retrain a pre-trained model available from the above repository, by adding new classes to the model. For example, the trained COCO dataset model…
1
2 3
99 100