Questions tagged [data-science]

Implementation questions about data science. Data science concerns extracting knowledge or insights from data, in whatever shape or form. It can contain predictive analytics and usually takes a lot of data wrangling. General questions about data science should be posted to their respective communities.

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from data in various forms, both structured and unstructured, similar to data-mining.

Wikipedia

NOTE: If you want to use this tag for a question not directly concerning implementation, then consider posting on Cross Validated, Data Science, or Artificial Intelligence instead. Otherwise you're probably off-topic.

9099 questions

273

votes

17 answers

'Conda' is not recognized as internal or external command

I installed Anaconda3 4.4.0 (32 bit) on my Windows 7 Professional machine and imported NumPy and Pandas on Jupyter notebook so I assume Python was installed correctly. But when I type conda list and conda --version in command prompt, it says conda…

python anaconda conda data-science

asked Jun 13 '17 at 08:09

Kshitiz

3,431
4
14
22

209

votes

8 answers

Where do I call the BatchNormalization function in Keras?

If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning? I read this documentation for it: http://keras.io/layers/normalization/ I don't see where I'm supposed to call it. Below is my code…

python keras neural-network data-science batch-normalization

asked Jan 11 '16 at 07:47

pr338

8,730
19
52
71

206

votes

8 answers

Unable to allocate array with shape and data type

I'm facing an issue with allocating huge arrays in numpy on Ubuntu 18 while not facing the same issue on MacOS. I am trying to allocate memory for a numpy array with shape (156816, 36, 53806) with np.zeros((156816, 36, 53806), dtype='uint8') and…

python numpy data-science

asked Aug 15 '19 at 09:48

Martin Brisiak

3,872
12
37
51

122

votes

6 answers

How to load a model from an HDF5 file in Keras?

How to load a model from an HDF5 file in Keras? What I tried: model = Sequential() model.add(Dense(64, input_dim=14, init='uniform')) model.add(LeakyReLU(alpha=0.3)) model.add(BatchNormalization(epsilon=1e-06, mode=0, momentum=0.9,…

python machine-learning keras data-science

asked Jan 29 '16 at 00:03

pr338

8,730
19
52
71

votes

17 answers

fit_transform() takes 2 positional arguments but 3 were given with LabelBinarizer

I am totally new to Machine Learning and I have been working with unsupervised learning technique. Image shows my sample Data(After all Cleaning) Screenshot : Sample Data I have this two Pipline built to Clean the Data: num_attribs =…

python scikit-learn data-science

asked Sep 11 '17 at 19:12

Viral Parmar

1,155
2
8
8

votes

8 answers

ValueError: Wrong number of items passed - Meaning and suggestions?

I am receiving the error: ValueError: Wrong number of items passed 3, placement implies 1, and I am struggling to figure out where, and how I may begin addressing the problem. I don't really understand the meaning of the error; which is making it…

python pandas prediction data-science

asked Apr 04 '17 at 01:35

Gary

2,137
3
23
41

votes

3 answers

difference between StratifiedKFold and StratifiedShuffleSplit in sklearn

As from the title I am wondering what is the difference between StratifiedKFold with the parameter shuffle=True StratifiedKFold(n_splits=10, shuffle=True, random_state=0) and StratifiedShuffleSplit StratifiedShuffleSplit(n_splits=10,…

python machine-learning scikit-learn data-science cross-validation

asked Aug 30 '17 at 20:43

gabboshow

5,359
12
48
98

votes

4 answers

Normalize data before or after split of training and testing data?

I want to separate my data into train and test set, should I apply normalization over data before or after the split? Does it make any difference while building predictive model?

machine-learning data-science normalization training-data train-test-split

asked Mar 23 '18 at 07:13

hemant

votes

5 answers

Scikit-learn's LabelBinarizer vs. OneHotEncoder

What is the difference between the two? It seems that both create new columns, which their number is equal to the number of unique categories in the feature. Then they assign 0 and 1 to data points depending on what category they are in.

python encoding scikit-learn data-science categorical-data

asked May 22 '18 at 17:25

Roozbeh Bakhshi

votes

2 answers

What does the standard Keras model output mean? What is epoch and loss in Keras?

I have just built my first model using Keras and this is the output. It looks like the standard output you get after building any Keras artificial neural network. Even after looking in the documentation, I do not fully understand what the epoch is…

python machine-learning neural-network keras data-science

asked Jan 08 '16 at 09:22

pr338

8,730
19
52
71

votes

5 answers

How to do superscripts and subscripts in Jupyter Notebook?

I want to to use numbers to indicate references in footnotes, so I was wondering inside of Jupyter Notebook how can I use superscripts and subscripts?

python jupyter-notebook jupyter data-science

asked Sep 02 '17 at 08:08

PraveenHarris

votes

6 answers

Apply StandardScaler to parts of a data set

I want to use sklearn's StandardScaler. Is it possible to apply it to some feature columns but not others? For instance, say my data is: data = pd.DataFrame({'Name' : [3, 4,6], 'Age' : [18, 92,98], 'Weight' : [68, 59,49]}) Age Name Weight 0 …

python pandas scikit-learn scale data-science

asked Jul 17 '16 at 11:47

mitsi

1,005
2
11
15

votes

2 answers

How to tell which Keras model is better?

I don't understand which accuracy in the output to use to compare my 2 Keras models to see which one is better. Do I use the "acc" (from the training data?) one or the "val acc" (from the validation data?) one? There are different accs and val accs…

python machine-learning keras data-science

asked Jan 10 '16 at 04:23

pr338

8,730
19
52
71

votes

1 answer

Logistic Regression PMML won't Produce Probabilities

As part of a machine-learning deployment project, I built a proof-of-concept where I created two simple logistic regression models for a binary classification task using R's glm function and python's scikit-learn. Afterwards, I converted those…

python r data-science pmml knime

asked Nov 02 '18 at 02:05

FatihAkici

4,679
2
31
48

votes

2 answers

What is the difference between Spyder and Jupyter?

I am learning Python for data science, but my problem is that I still don't understand the difference between Spyder and Jupyter! I would like you guys to help me to understand the difference, please; I would appreciate that.

python data-science jupyter spyder

asked Nov 19 '18 at 04:18

Amir Boutaghou

2 3

…

99 100 Next