Questions tagged [kaggle]

Relating to Competitions, Datasets, Kernels, Learn, or Kaggle's API.

Relating to the following Kaggle data science categories:

1115 questions
68
votes
2 answers

Create a set from a series in pandas

I have a dataframe extracted from Kaggle's San Fransico Salaries: https://www.kaggle.com/kaggle/sf-salaries and I wish to create a set of the values of a column, for instance 'Status'. This is what I have tried but it brings a list of all the…
Julio Arriaga
  • 911
  • 1
  • 10
  • 13
60
votes
13 answers

Using Kaggle Datasets in Google Colab

Is it possible to use any datasets available via the kaggle API in Google Colab? I see the Kaggle API is used in this Colab notebook, but it's a bit unclear to me what datasets it provides access to.
hdiz
  • 1,141
  • 2
  • 13
  • 27
50
votes
2 answers

Setting environment variables in Google Colab

I'm trying to use the Kaggle CLI API, and in order to do that, instead of using kaggle.json for authentication, I'm using environment variables to set the credentials. !pip install --upgrade kaggle !export KAGGLE_USERNAME=abcdefgh !export…
Bohrium272
  • 656
  • 1
  • 5
  • 5
39
votes
6 answers

Can't find kaggle.json file in google colab

I'm trying to download the kaggle imagenet object localization challenge data into google colab so that I can use it to train my model. Kaggle uses an API for easy and fast access to their datasets. (https://github.com/Kaggle/kaggle-api) However,…
Diego Domenig
  • 397
  • 1
  • 3
  • 7
39
votes
4 answers

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte, while reading csv file in pandas

I know similar questions has been asked already I have seen all of them and tried but of little help. I am using OSX 10.11 El Capitan, python3.6., virtual environment, tried without that also. I am using jupyter notebook and spyder3. I am new to…
shubham_827
  • 405
  • 1
  • 4
  • 10
36
votes
2 answers

What is OOF approach in machine learning?

I have seen in many kaggle notebooks people talk about oof approach when they do machine learning with K-Fold validation. What is oof and is it related to k-fold validation ? Also can you suggest some useful resources for it to get the concept in…
Nikhil Mishra
  • 1,182
  • 2
  • 18
  • 34
30
votes
2 answers

Working with neuralnet in R for the first time: get "requires numeric/complex matrix/vector arguments"

I'm in the process of attempting to learn to work with neural networks in R. As a learning problem, I've been using the following problem over at Kaggle: Don't worry, this problem is specifically designed for people to learn with, there's no reward…
user2548029
  • 425
  • 2
  • 6
  • 10
28
votes
11 answers

Linear model function lm() error: NA/NaN/Inf in foreign function call (arg 1)

Say I have data.frame a I use m.fit <- lm(col2 ~ col3 * col4, na.action = na.exclude) col2 has some NA values, col3 and col4 have values less than 1. I keep getting Error in lm.fit(x, y, offset = offset, singular.ok = singular.ok, ...) :…
Pk.yd
  • 311
  • 1
  • 3
  • 6
22
votes
1 answer

Pandas error - invalid value encountered

I'm new to Pandas. I downloaded and installed Anaconda. Then I tried running the following code via the Spyder app: import pandas as pd import numpy as np train = pd.read_csv('/Users/Ben/Documents/Kaggle/Titanic/train.csv') train Although this…
Ben
  • 20,038
  • 30
  • 112
  • 189
21
votes
2 answers

documentation for Kaggle API *within* python?

I want to write a python script that downloads a public dataset from Kaggle.com. The Kaggle API is written in python, but almost all of the documentation and resources that I can find are on how to use the API in command line, and very little on…
Antoine
  • 600
  • 7
  • 19
20
votes
3 answers

What does KFold in python exactly do?

I am looking at this tutorial: https://www.dataquest.io/mission/74/getting-started-with-kaggle I got to part 9, making predictions. In there there is some data in a dataframe called titanic, which is then divided up in folds using: # Generate cross…
user
  • 2,015
  • 6
  • 22
  • 39
15
votes
1 answer

How to view the nearest neighbors in R?

Let me start by saying I have no experience with R, KNN or data science in general. I recently found Kaggle and have been playing around with the Digit Recognition competition/tutorial. In this tutorial they provide some sample code to get you…
Abe Miessler
  • 82,532
  • 99
  • 305
  • 486
14
votes
5 answers

General techniques to work with huge amounts of data on a non-super computer

I'm taking some AI classes and have learned about some basic algorithms that I want to experiment with. I have gotten access to several data sets containing lots of great real-world data through Kaggle, which hosts data analysis competitions. I have…
Rishi
  • 3,538
  • 5
  • 29
  • 40
14
votes
8 answers

Download Kaggle Dataset by using Python

I have trying to download the kaggle dataset by using python. However i was facing issues by using the request method and the downloaded output .csv files is a corrupted html files. import requests # The direct link to the Kaggle data set data_url…
Johnson
  • 141
  • 1
  • 1
  • 4
13
votes
3 answers

mount google drive in kaggle notebook

In google colab, I easily mount my google drive with this: from google.colab import drive drive.mount('/content/gdrive') In kaggle's notebook, however, it gives this error: KeyError Traceback (most recent call…
Kasra
  • 1,959
  • 1
  • 19
  • 29
1
2 3
74 75