I have 1.3 GB tsv data file which I need to use to do some analytics using R or python. My machine has 8 GB RAM and it is running Windows 8. I am not able to load the file using RStudio or any file reading application.What do you recommend so that I am able to read the file and work on it? Should I move to Amazon with hadoop ? This is a big data problem as it seems to me.
Asked
Active
Viewed 84 times
-1
-
If you don't read the entire file into memory, for example, read line by line, and do some work, then you should be fine. There are also ways to do mapreduce locally, so needing a Hadoop cluster is completely unnecessary for the size of data you have – OneCricketeer Apr 04 '16 at 05:29
-
yes. But when I am writing the R program, it is going to work on the entire file. So even if I just peek the top k records, it is not going to help. Could you give some more information about running mapreduce job locally? – Zack Apr 04 '16 at 05:32
-
Also, I need to work with either R or python. – Zack Apr 04 '16 at 05:35
-
Ever heard of the Cloudera Quickstart VM? And I don't use R, but here's a python link. http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ – OneCricketeer Apr 04 '16 at 05:38
-
1Anyways, back to the point of Hadoop isn't needed... I'm fairly certain Pandas in Python can read and crunch some data that size very reasonably. – OneCricketeer Apr 04 '16 at 05:41
-
I'm surprised that the tsv file can't be loaded in - are you getting an error saying that a file greater than 1 GB cannot be loaded? I have seen people working with a 2 GB file on a similar spec'd workstation just yesterday. The default function for reading files has a limit of 1 GB but there are libraries which will let you load bigger ones. – Christopher Krapu Apr 04 '16 at 14:56
-
@cricket_007 I am able to work using python pandas. You may post your answer and I would accept it. – Zack Apr 04 '16 at 19:09
-
3Possible duplicate of [How to I load a tsv file into a Pandas DataFrame?](http://stackoverflow.com/questions/9652832/how-to-i-load-a-tsv-file-into-a-pandas-dataframe) – OneCricketeer Apr 04 '16 at 19:15
-
@cricket_007 That question is totally different and has no relation to this question. – Zack Apr 04 '16 at 19:25
-
1It tells you how to load the file. I'm not going to copy an answer telling you how to do it with Pandas because the answer that you are looking for is already there. – OneCricketeer Apr 04 '16 at 19:35
-
@cricket_007 The question is not about how to load a file with Pandas. As far as I can tell, they're unable to load the file due to its size. – Rob Apr 08 '16 at 12:56
1 Answers
0
I was facing problems while loading in R. I am able to load it using python pandas and seems to be working with 8GB RAM computer.

Zack
- 2,078
- 10
- 33
- 58
-
1You've written this in a way that indicates it was a solution, but your question said you already had 8GB. If this is an addendum to your question, please delete this answer and edit it into your question – Rob Apr 08 '16 at 12:54
-
1I read it as "Couldn't get it to work with R, but I did get it to work with Python Pandas even on my 8gb machine". – Gimby Apr 08 '16 at 12:58
-