Questions tagged [modin]

Modin is a project to speed up pandas workflows only by changing a single import statement.

Modin is a project to speed up pandas workflows only by changing a single import statement. Peruse the documentation at https://modin.readthedocs.io/.

80 questions
18
votes
3 answers

Cannot install RAY

Ray library from RISE lab (https://rise.cs.berkeley.edu/blog/pandas-on-ray/) I am using Windows 10 Pro, 64-bit and running these scripts from Anaconda prompt. I have tried both pip install ray and pip3 install ray with the same…
cube
  • 345
  • 1
  • 2
  • 9
16
votes
2 answers

Comparison between Modin | Dask | Data.table | Pandas for parallel processing and out of memory csv files

What are the fundamental difference and primary use-cases for Dask | Modin | Data.table I checked the documentation of each libraries, all of them seem to offer a 'similar' solution to pandas limitations
Shubham Samant
  • 171
  • 1
  • 5
7
votes
4 answers

Error while importing library "modin" in Python 3.6

import modin.pandas as pd I am importing modin.pandas library in my windows 10 machine but getting error "AttributeError: module 'ray' has no attribute 'utils'" Anything missed while installing modin library?
Learnings
  • 2,780
  • 9
  • 35
  • 55
7
votes
0 answers

Is modin useful on AWS Lambda

AWS Lambda comes with 6 vCPU. Modin for Pandas promises to use cores to make processing efficient. Does this actually deliver on AWS Lambda, which otherwise does not support multi-threading, multi-processing etc. ? # import pandas as pd import…
bonney
  • 537
  • 4
  • 15
5
votes
2 answers

how to load modin dataframe from pyarrow or pandas

Since Modin does not support loading from multiple pyarrow files on s3, I am using pyarrow to load the data. import s3fs import modin.pandas as pd from pyarrow import parquet s3 = s3fs.S3FileSystem( key=aws_key, …
galinden
  • 610
  • 8
  • 13
4
votes
1 answer

modin pandas read_parquet() failed on ETag KeyError trying to read a partitioned parquet from s3

I created a dataframe from pandas and used to_parquet(...) to write to s3 directly. arguments are: df.to_parquet('s3://bucket/fn.parquet', compression='gzip', engine='fastparquet', partition_cols=['col1']) when I use pandas's…
michaelgbj
  • 290
  • 1
  • 10
4
votes
1 answer

Modin is taking more time than pandas for reading CSV

I'm using modin.pandas to scale pandas for large dataset. However, when using pd.read_csv to load a 5 MB csv dataset in jupyter notebook to compare the performance of modin.pandas and pandas, it gives unexpected time duration of…
Shradha
  • 2,232
  • 1
  • 14
  • 26
3
votes
1 answer

Ray object store running out of memory using out of core. How can I configure an external object store like s3 bucket?

import ray import numpy as np ray.init() @ray.remote def f(): return np.zeros(10000000) results = [] for i in range(100): print(i) results += ray.get([f.remote() for _ in range(50)]) Normally, when the object store fills up, it begins…
testgauss321
  • 77
  • 1
  • 5
3
votes
1 answer

Speeding up reading and operating on 30,000 csv files

I am using Python 3 and pandas(pd.read_csv) to read the files. There are no headers and the separator is ' |, | '. Also, the files are not .csv files and the operating system is CentOS. There are 30,000 files in a folder with a total size of 10GB.…
Adienl
  • 155
  • 6
3
votes
3 answers

Unable to fully install and import Modin Package

I am trying to use the modin package to speed up my pandas dataframe calculations. In short, the installation has not been as straightforward as pip install modin When simply running pip install modin everything seems to be going fine (except for…
Merv Merzoug
  • 1,149
  • 2
  • 19
  • 33
2
votes
1 answer

modin shown a warning message "Perhaps you already have a cluster running?"

I am using modin to read an sql table, however I am getting this warning import pyodbc import sqlalchemy as sal from sqlalchemy import create_engine import modin.pandas as pd from distributed import Client client = Client() …
Debayan
  • 572
  • 6
  • 16
2
votes
1 answer

ERROR: No matching distribution found for pandas==1.0.3 (from modin)

I'm trying to speed up my code using parallel processing with the modin library. I tried to do it with the dask engine on my Windows 10 computer but it didn't work, I thought that it is because it is still under development. I read that you can't…
Geno
  • 21
  • 1
  • 3
2
votes
1 answer

Faster pandas apply using modin.pandas

Trying to use all cores for this apply function using modin.pandas from nltk.sentiment.vader import SentimentIntensityAnalyzer sid = SentimentIntensityAnalyzer() # sentiment Score of essay data = data.merge(data.essay.apply(lambda s:…
dracarys3
  • 107
  • 2
  • 12
2
votes
1 answer

My code is running properly in pandas, but not in modin

when i use pandas, the code works perfect ( but very slow ), and when use modin, and concat dataframe, shows me an aerror contador = 0 df = pd.DataFrame() data = pd.DataFrame() for file in range(len(files)): usefile = files[file] …
zkittlez
  • 21
  • 3
1
vote
1 answer

import modin.pandas and ray() don't close file

I'm trying to use modin and ray() but I can't move file after read it. In line shutil.move(f"./IMPORT/"+file,f"./IMPORTED/"+file) file is still open, there is some way to close it and move it in other folder? Here is entire code: import os …
1
2 3 4 5 6