6

I am trying to import some data from kaggle into notebook. The error I am receiving is a 401 unauthorized, however I have accepted the competition rules and I am able to download the data.

This is the code I am running:

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi()
files = api.competition_download_files("twosigmanews")
api.competitions_submit("submission.csv", "my submission message", "twosigmanews")

EDIT: Added more of the error: No matter which kaggle data I wish to import I obtain the same error.


ApiException                              Traceback (most recent call last)
<ipython-input-7-65a92f19da82> in <module>()
      2 
      3 api = KaggleApi()
----> 4 files = api.competition_download_files("twosigmanews")
      5 api.competitions_submit("submission.csv", "my submission message", "twosigmanews")

~\Anaconda3\lib\site-packages\kaggle\api\kaggle_api_extended.py in competition_download_files(self, competition, path, force, quiet)
    637             quiet: suppress verbose output (default is False)
    638         """
--> 639         files = self.competition_list_files(competition)
    640         if not files:
    641             print('This competition does not have any available data files')

~\Anaconda3\lib\site-packages\kaggle\api\kaggle_api_extended.py in competition_list_files(self, competition)
    554         """
    555         competition_list_files_result = self.process_response(
--> 556             self.competitions_data_list_files_with_http_info(id=competition))
    557         return [File(f) for f in competition_list_files_result]
    558 

~\Anaconda3\lib\site-packages\kaggle\api\kaggle_api.py in competitions_data_list_files_with_http_info(self, id, **kwargs)
    416             _preload_content=params.get('_preload_content', True),
    417             _request_timeout=params.get('_request_timeout'),
--> 418             collection_formats=collection_formats)
    419 
    420     def competitions_list(self, **kwargs):  # noqa: E501

~\Anaconda3\lib\site-packages\kaggle\api_client.py in call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, async_req, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    332                                    response_type, auth_settings,
    333                                    _return_http_data_only, collection_formats,
--> 334                                    _preload_content, _request_timeout)
    335         else:
    336             thread = self.pool.apply_async(self.__call_api, (resource_path,

~\Anaconda3\lib\site-packages\kaggle\api_client.py in __call_api(self, resource_path, method, path_params, query_params, header_params, body, post_params, files, response_type, auth_settings, _return_http_data_only, collection_formats, _preload_content, _request_timeout)
    163             post_params=post_params, body=body,
    164             _preload_content=_preload_content,
--> 165             _request_timeout=_request_timeout)
    166 
    167         self.last_response = response_data

~\Anaconda3\lib\site-packages\kaggle\api_client.py in request(self, method, url, query_params, headers, post_params, body, _preload_content, _request_timeout)
    353                                         _preload_content=_preload_content,
    354                                         _request_timeout=_request_timeout,
--> 355                                         headers=headers)
    356         elif method == "HEAD":
    357             return self.rest_client.HEAD(url,

~\Anaconda3\lib\site-packages\kaggle\rest.py in GET(self, url, headers, query_params, _preload_content, _request_timeout)
    249                             _preload_content=_preload_content,
    250                             _request_timeout=_request_timeout,
--> 251                             query_params=query_params)
    252 
    253     def HEAD(self, url, headers=None, query_params=None, _preload_content=True,

~\Anaconda3\lib\site-packages\kaggle\rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
    239 
    240         if not 200 <= r.status <= 299:
--> 241             raise ApiException(http_resp=r)
    242 
    243         return r

ApiException: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'private', 'Content-Length': '37', 'Content-Type': 'application/json; charset=utf-8', 'X-MiniProfiler-Ids': '["b1df1310-4d5b-4000-8f43-e5b6f4958a48","b9dcdaa4-64ef-4be1-bbbe-90fe664a81bd","db1868eb-0a12-4217-a89a-5cbb3946b0e7","b8166dda-a74f-4e64-8bd4-fe529e95bf04","205f9250-b5eb-4cfd-b94c-976778be8f17","229360b9-37d4-456f-b030-9e56879d7c84"]', 'X-Frame-Options': 'SAMEORIGIN', 'Referrer-Policy': 'strict-origin-when-cross-origin', 'Set-Cookie': 'ARRAffinity=87506ffb959c51b2ba135ec75a7dffc3bc28e2948e5cb4ee012d8d916b147438;Path=/;HttpOnly;Domain=www.kaggle.com', 'Date': 'Sat, 06 Oct 2018 16:23:01 GMT'})
HTTP response body: {"code":401,"message":"Unauthorized"}
user113156
  • 6,761
  • 5
  • 35
  • 81

6 Answers6

4

I think that the name of the competition is wrong. Try:

from kaggle.api.kaggle_api_extended import KaggleApi

api = KaggleApi('copy and paste kaggle.json content here')
api.authenticate()
files = api.competition_download_files("two-sigma-financial-news")
anitasp
  • 577
  • 4
  • 13
  • 35
  • This seems to throw up no errors but I cannot find the original Kaggle post to play around with the data to make sure it imported correctly. – user113156 Oct 15 '18 at 22:48
2

While looking through the source code, I found this class. I think the notebook doesn't auto authenticate when you call KaggleApi(), hence you need to call the authenticate function on the API to connect to the Kaggle API.

Try:

api = KaggleApi()
api.authenticate()

I was able to connect and download the samples after this call.

TOBlender
  • 1,053
  • 11
  • 17
  • While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value. – Nic3500 Oct 15 '18 at 05:03
2

Here is another pythonic way to import your data from kaggle API. I assume you are working on cloud instance with linux OS.

Here is how I do it :

  1. get your kaggle.json file from your kaggle account page: https://www.kaggle.com/<username>/account
  2. Run this code and make sure you have the kaggle.json in the right directory.

    import json 
    import os 
    os.chdir("~/.kaggle")
    data = {"username":"username","key":"tockenvalue"} # get this data from kaggle.json file 
    with open('kaggle.json', 'w') as outfile:
        json.dump(data, outfile)
    
  3. in terminal, cd change directory to where you want to put your data and then type in terminal : kaggle competitions download -c two-sigma-financial-news

This is available everytime you want to import data from Kaggle API.

smerllo
  • 3,117
  • 1
  • 22
  • 37
2

You have not provided any authorization to your code, e.g. your user id, password, and the most important authentication key. The authentication key is given after user id and its Kaggle password is provided.

The Kaggle authentication can be obtained from api.authenticate() function after assigning Kaggle API to the variable named "api".

Jishan Shaikh
  • 1,572
  • 2
  • 13
  • 31
1

Your username and key is either not provided or invalid.

Goto https://www.kaggle.com/username/account and create new API token. kaggle.json file will be downloaded. Place it in ~/.kaggle/kaggle.json or C:\Users\User\.kaggle\kggle.json.

Also, you have to click "I understand and accept" in Rules Acceptance section for the data your going to download.

prisar
  • 3,041
  • 2
  • 26
  • 27
  • I did this for the previous problem I was having, it solved that problem and now I have an unauthorised issue. – user113156 Oct 06 '18 at 17:05
  • You have to click "I understand and accept" in Rules Acceptance section for the data your going to download. Did you accept? – prisar Oct 06 '18 at 17:09
  • I have accepted the rules, and I am able to download the data. I just cannot import it directly from the kaggle website. – user113156 Oct 06 '18 at 17:24
  • @user113156 "I just cannot import it directly from the kaggle website" - What does that mean? – prisar Oct 06 '18 at 17:49
  • I am under the impression that `from kaggle.competitions import twosigmanews` allows me to download the data directly from kaggle? – user113156 Oct 06 '18 at 17:52
  • I just tried that again and get this error `ModuleNotFoundError: No module named 'kaggle.competitions'` – user113156 Oct 06 '18 at 18:02
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/181404/discussion-between-t-prisar-and-user113156). – prisar Oct 06 '18 at 18:13
0

Colab is also best methode to import the kaggle dataset the steps are: ! pip install kaggle ! mkdir ~/.kaggle ! cp kaggle.json ~/.kaggle/ ! chmod 600 ~/.kaggle/kaggle.json ! kaggle datasets download -d rohanrao/air-quality-data-in-india

  • Hi! Welcome to SO. You can surround code in backticks ("`") in order to make it format correctly. In addition, can you explain why your answer will help the question-asker? It seems like your code may help, but it doesn't answer the question. – Pro Q Oct 19 '22 at 06:07