Questions tagged [data-science]

Implementation questions about data science. Data science concerns extracting knowledge or insights from data, in whatever shape or form. It can contain predictive analytics and usually takes a lot of data wrangling. General questions about data science should be posted to their respective communities.

Data science is an interdisciplinary field that uses scientific methods, processes, and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to .

Wikipedia

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead. Otherwise you're probably off-topic.

9099 questions
273
votes
17 answers

'Conda' is not recognized as internal or external command

I installed Anaconda3 4.4.0 (32 bit) on my Windows 7 Professional machine and imported NumPy and Pandas on Jupyter notebook so I assume Python was installed correctly. But when I type conda list and conda --version in command prompt, it says conda…
Kshitiz
  • 3,431
  • 4
  • 14
  • 22
209
votes
8 answers

Where do I call the BatchNormalization function in Keras?

If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning? I read this documentation for it: http://keras.io/layers/normalization/ I don't see where I'm supposed to call it. Below is my code…
pr338
  • 8,730
  • 19
  • 52
  • 71
206
votes
8 answers

Unable to allocate array with shape and data type

I'm facing an issue with allocating huge arrays in numpy on Ubuntu 18 while not facing the same issue on MacOS. I am trying to allocate memory for a numpy array with shape (156816, 36, 53806) with np.zeros((156816, 36, 53806), dtype='uint8') and…
Martin Brisiak
  • 3,872
  • 12
  • 37
  • 51
122
votes
6 answers

How to load a model from an HDF5 file in Keras?

How to load a model from an HDF5 file in Keras? What I tried: model = Sequential() model.add(Dense(64, input_dim=14, init='uniform')) model.add(LeakyReLU(alpha=0.3)) model.add(BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9,…
pr338
  • 8,730
  • 19
  • 52
  • 71
96
votes
17 answers

fit_transform() takes 2 positional arguments but 3 were given with LabelBinarizer

I am totally new to Machine Learning and I have been working with unsupervised learning technique. Image shows my sample Data(After all Cleaning) Screenshot : Sample Data I have this two Pipline built to Clean the Data: num_attribs =…
Viral Parmar
  • 1,155
  • 2
  • 8
  • 8
81
votes
8 answers

ValueError: Wrong number of items passed - Meaning and suggestions?

I am receiving the error: ValueError: Wrong number of items passed 3, placement implies 1, and I am struggling to figure out where, and how I may begin addressing the problem. I don't really understand the meaning of the error; which is making it…
Gary
  • 2,137
  • 3
  • 23
  • 41
78
votes
3 answers

difference between StratifiedKFold and StratifiedShuffleSplit in sklearn

As from the title I am wondering what is the difference between StratifiedKFold with the parameter shuffle=True StratifiedKFold(n_splits=10, shuffle=True, random_state=0) and StratifiedShuffleSplit StratifiedShuffleSplit(n_splits=10,…
77
votes
4 answers

Normalize data before or after split of training and testing data?

I want to separate my data into train and test set, should I apply normalization over data before or after the split? Does it make any difference while building predictive model?
65
votes
5 answers

Scikit-learn's LabelBinarizer vs. OneHotEncoder

What is the difference between the two? It seems that both create new columns, which their number is equal to the number of unique categories in the feature. Then they assign 0 and 1 to data points depending on what category they are in.
53
votes
2 answers

What does the standard Keras model output mean? What is epoch and loss in Keras?

I have just built my first model using Keras and this is the output. It looks like the standard output you get after building any Keras artificial neural network. Even after looking in the documentation, I do not fully understand what the epoch is…
pr338
  • 8,730
  • 19
  • 52
  • 71
52
votes
5 answers

How to do superscripts and subscripts in Jupyter Notebook?

I want to to use numbers to indicate references in footnotes, so I was wondering inside of Jupyter Notebook how can I use superscripts and subscripts?
PraveenHarris
  • 799
  • 1
  • 7
  • 10
47
votes
6 answers

Apply StandardScaler to parts of a data set

I want to use sklearn's StandardScaler. Is it possible to apply it to some feature columns but not others? For instance, say my data is: data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]}) Age Name Weight 0 …
mitsi
  • 1,005
  • 2
  • 11
  • 15
44
votes
2 answers

How to tell which Keras model is better?

I don't understand which accuracy in the output to use to compare my 2 Keras models to see which one is better. Do I use the "acc" (from the training data?) one or the "val acc" (from the validation data?) one? There are different accs and val accs…
pr338
  • 8,730
  • 19
  • 52
  • 71
43
votes
1 answer

Logistic Regression PMML won't Produce Probabilities

As part of a machine-learning deployment project, I built a proof-of-concept where I created two simple logistic regression models for a binary classification task using R's glm function and python's scikit-learn. Afterwards, I converted those…
FatihAkici
  • 4,679
  • 2
  • 31
  • 48
37
votes
2 answers

What is the difference between Spyder and Jupyter?

I am learning Python for data science, but my problem is that I still don't understand the difference between Spyder and Jupyter! I would like you guys to help me to understand the difference, please; I would appreciate that.
Amir Boutaghou
  • 545
  • 1
  • 4
  • 6
1
2 3
99 100