How to read large data set from mongodb to pandas dataframe

Question

I have a large data set which contains data like (9232363X102 and 10 gb file approx) . I have a 12 Gb ram system. How can I read this from pandas and convert as DataFrame. First I tried

df=pd.DataFrame(list(mng_clxn.find({})

It freezes my system

So I tried to read specific columns but still no use, I read like this,

df=pd.DataFrame(list(mng_clxn.find({},{'col1':1,col2:1,'col3':1,col4:1})

another thing I tried was reading as a chunk, for that

df_db=pd.DataFrame()
offset=0
thresh=1000000
while(offset<9232363):

    chunk=pd.DataFrame(list(mng_clxn.find({},).limit(thresh).skip(offset)))
    offset+=thresh
    df_db=df_db.append(chunk)

It's also no use, What should I do now?

Can I solve this problem with my system (12gb Ram)? Any idea would be appreciable.

Feel free to mark as duplicate if you found any other SO questions similar to this.

Thanks in advance.

@SethRothschild I implemented same method. still there is no improvement. — Mohamed Thasin ah, Mar 09 '18 at 05:15

score 0 · Accepted Answer · answered Mar 09 '18 at 05:24

0

You'll likely need more memory to handle that dataset in a reasonable way. Consider running step 4 from this question to be sure. You might also consider this question about using pandas with large datasets but generally, you'll probably want more than 2gb of space to use to manipulate the data even if you find a way to load it in.

answered Mar 09 '18 at 05:24

Seth Rothschild

384
1
14

I understood that I can't read a collection at once. But I can read same file using pandas chunks. Is there any way to read collection as chunks? – Mohamed Thasin ah Mar 09 '18 at 05:41

How to read large data set from mongodb to pandas dataframe

1 Answers1