I am trying to start a machine learning study by using the table which is in the Google Big Query. Size of table is nearly 36 GB and the table has nearly 300 millions of rows with 6 columns. I tried the code below:
from google.colab import auth
auth.authenticate_user()
print('Authenticated')
from google.cloud import bigquery
import json
import requests
import os
from google.cloud import bigquery
from datetime import datetime
project_id = 'my_project_name'
client = bigquery.Client(project=project_id)
import os
os.environ["GOOGLE_CLOUD_PROJECT"] = project_id
def getClient():
client = bigquery.Client()
return client
def executeQuery(query):
client = getClient()
query_job = client.query(query).to_dataframe()
return query_job
ml_query='''SELECT * FROM `my_table_name`'''
results=executeQuery(ml_query)
When I limit the table like ml_query='''SELECT * FROM my_table_name
limit 1000''', The code runs. That's why, I think the problem is the size of data but I am sure that people is using bigger than this dataset. How can I handle this problem?
I tried also to download as csv but I learned that the row limit of csv format is 1 million. Any solution including extracting the data and using pycharm or jupyter notebook instead of colab is okey for me.
I have also 100 GB of Disk Space in colab.
I appreciate if you can help.