-3

Im trying to learn the basics in data science, so I'm attempting to mine the following dataset (csv) using python.

 https://www.kaggle.com/jessevent/all-crypto-currencies

70mg file

 My setup

6 Core bulldoser AMD

120 GB SSD

2GB data drive

Ubuntu 12.04 with libvirt KVM/QEMU

Python 3

Python 2.7

It has transpired that my setup can't crunch the numbers as I need to save in execl format .xlsx and i cant save the file as the system hangs.

I'm wondering if its possible to use the native csv file and run numpy, matplotlib and pandas etc, and not call the data using .excel.

 I need to be able to use the the dataset, for doing basic data exploration, data cleaning and model construction, validation etc

  • 2
    This question is unclear. What exactly are you trying to do? What's the code that makes the computer hang? Why do you need to save the data in Excel format? And why does your file weigh 70 milligrams? – Sven Marnach May 14 '18 at 09:04
  • Can I run packages against raw csv files . – user1049286 May 14 '18 at 09:37
  • 1
    Pandas can open csv's chunkwise - not sure if that helps you: https://stackoverflow.com/questions/33642951/python-using-pandas-structures-with-large-csviterate-and-chunksize – Patrick Artner May 14 '18 at 13:25

1 Answers1

2

CSV support of Python is much better than Excel support.

Because Excel plays a minor role in data mining and are a pain to use, but CSV files are just everywhere and straightforward.

So just load the car file with the CSV file loading function instead of using the Excel file loader.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
  • Damn its working and super quick, kudos to you my man. open('/home/Documents/crypto-markets.csv', 'r') as fp: # reader = csv.reader(fp, delimiter=',', quotechar='"') – user1049286 May 14 '18 at 20:39