3

I am working on a project of Information Retrieval. For that I am using Google Colab. I am in the phase where I have computed some features ("input_features") and I have the labels ("labels") by doing a for loop, which took me about 4 hours to finish.

So at the end I have appended the results to an array:

input_features = np.array(input_features)
labels = np.array(labels)

So my question would be: Is it possible to save those results in order to use them future purposes when using google colab?

I have found 2 options that maybe could be applied but I don't know where these files are created.

1) To save them as csv files. And my code would be:

from numpy import savetxt
# save to csv file
savetxt('input_features.csv', input_features, delimiter=',')
savetxt('labels.csv', labels, delimiter=',')

And in order to load them:

from numpy import loadtxt
# load array
input_features = loadtxt('input_features.csv', delimiter=',')
labels = loadtxt('labels.csv', delimiter=',')
# print the array
print(input_features)
print(labels)

But still I don't get something back when I print.


2) Save the results of an array by using pickle where I followed these instructions from here: https://colab.research.google.com/drive/1EAFQxQ68FfsThpVcNU7m8vqt4UZL0Le1#scrollTo=gZ7OTLo3pw8M

from google.colab import files
import pickle
def features_pickeled(input_features, results):
  input_features = input_features + '.txt'
  pickle.dump(results, open(input_features, 'wb'))
  files.download(input_features)
def labels_pickeled(labels, results):
  labels = labels + '.txt'
  pickle.dump(results, open(labels, 'wb'))
  files.download(labels)

And to load them back:

def load_from_local():
  loaded_features = {}
  uploaded = files.upload()
  for input_features in uploaded.keys():
      unpickeled_features = uploaded[input_features]
      loaded[input_features] = pickle.load(BytesIO(data)) 
  return loaded_features 
def load_from_local():
  loaded_labels = {}
  uploaded = files.upload()
  for labels in uploaded.keys():
      unpickeled_labels = uploaded[labels]
      loaded[labels] = pickle.load(BytesIO(data))
  return loaded_labes

#How do I print the pickled files to see if I have them ready for use???

When using python I would do something like this for pickle:

#Create pickle file
with open("name.pickle", "wb") as pickle_file:
     pickle.dump(name, pickle_file)
#Load the pickle file
with open("name.pickle", "rb") as name_pickled:
     name_b = pickle.load(name_pickled)

But the thing is that I don't see any files to be created in my google drive.

Is my code correct or do I miss some part of the code?

Long description in order to hopefully have explained in detail what I want to do and what I have done for this issue.

Thank you in advance for your help.

Ledian K.
  • 555
  • 1
  • 8
  • 16

2 Answers2

4

Google Colaboratory notebook instances are never guaranteed to have access to the same resources when you disconnect and reconnect because they are run on virtual machines. Therefore, you can't "save" your data in Colab. Here are a few solutions:

  1. Colab saves your code. If the for loop operation you referenced doesn't take an extreme amount of time to run, just leave the code and run it every time you connect your notebook.
  2. Check out np.save. This function allows you to save an array to a binary file. Then, you could re-upload your binary file when you reconnect your notebook. Better yet, you could store the binary file on Google Drive, mount your drive to your notebook, and reference it like that.
  • @Nikos Papas Also take a look at [this question](https://stackoverflow.com/questions/47194063/persisting-data-in-google-colaboratory). Seems like this may be similar to what you're asking. – Andrew Wiedenmann May 08 '20 at 17:47
  • 1
    Thank you very much for your answer because you made me think differently. Just so other people can know, that may have a similar question as I had, you can create the csv file: `!ls #see what files do you have in your google drive import csv file_name_csv= [original_table] with open("file_name_csv.csv", "w") as f: writer = csv.writer(f) writer.writerows(file_name_csv)` And at the left corner you press the little icon of "Files" and there is your csv file and from there you can download it. – Ledian K. May 09 '20 at 20:32
  • @LedianK. Would you mind showing your full code as to how you saved the array to a gdrive directory and then how you loaded it again into colab? – Yogesh Riyat Nov 06 '21 at 18:51
  • @YogeshRiyat it's been a while since I posted this question but you can check the code below. Hope this helps you :) – Ledian K. Nov 07 '21 at 00:12
1
# Mount driver to authenticate yourself to gdrive
from google.colab import drive
drive.mount('/content/gdrive')

#---

# Import necessary libraries
import numpy as np
from numpy import savetxt
import pandas as pd

#---

# Create array
arr = np.array([1, 2, 3, 4, 5])

# save to csv file
savetxt('arr.csv', arr, delimiter=',')  # You will see the results if you press in the File icon (left panel)

And then you can load it again by:

# You can copy the path when you find your file in the file icon
arr = pd.read_csv('/content/arr.csv', sep=',', header=None) # You can also save your result as a txt file
arr
Ledian K.
  • 555
  • 1
  • 8
  • 16