1

I have certain computations performed on Dataset and I need the result to be stored in external file.

Had it been to CSV, to process it further I'd have to convert again to Dataframe/SFrame, which is again increasing lines of code.

Here's the snippet:

train_data = graphlab.SFrame(ratings_base)

Clearly, it is in SFrame and can be converted to DFrame using

df_train = train_data.to_dataframe()

Now that it is in DFrame, I need it exported to a file without changing it's structure. Since the exported file will be used as Argument to another python code. That code must accept DFrame and not CSV.

I have already check out in place1, place2, place3, place4 and place5

P.S. - I'm still digging for Python serialization, if anyone can simplify
it in the context would be helpful

T3J45
  • 717
  • 3
  • 12
  • 32
  • It is not clear what exactly you are trying to achieve. If you have a dataframe why do you need to export it to another dataframe? " I need the result to be stored in external file" Have you tried pickle? – DeepSpace Jun 05 '17 at 08:37
  • @DeepSpace I need dataframe into an external file, the way csv's are done. This external file is then planned for other program to be given in arguments. I hope this clears the doubt. – T3J45 Jun 05 '17 at 08:43

1 Answers1

0

I'd use HDFS format as it's supported by Pandas and by graphlab.SFrame and beside that HDFS format is very fast.

Alternatively you can export Pandas.DataFrame to Pickle file and read it from another scripts:

sf.to_dataframe().to_pickle(r'/path/to/pd_frame.pickle')

to read it back (from the same or from another script):

df = pd.read_pickle(r'/path/to/pd_frame.pickle')
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • pickle is the concept of Serialization, but I wonder if I can pass a pickled file in arguments. Is it possible? if so how do I unpickle it? – T3J45 Jun 05 '17 at 08:48
  • @Tejas, there is an example in the answer: `df = pd.read_pickle(r'/path/to/pd_frame.pickle')` – MaxU - stand with Ukraine Jun 05 '17 at 08:50
  • I had doubt in terms of passing it via Arguments. Anyways, I'll try it out. Much appreciated for your contribution. – T3J45 Jun 05 '17 at 09:01
  • I have popularity model from GraphLab any ideas how can I pickle it? I pickled it but while reading it returns error saying `Is a directory` – T3J45 Jun 05 '17 at 11:11