Questions tagged [cudf]

Use this tag for questions specifically related to the cuDF Library, or cuDF DataFrame manipulations.

From PyPI: The RAPIDS cuDF library is a GPU DataFrame manipulation library based on Apache Arrow that accelerates loading, filtering, and manipulation of data for model training data preparation. The RAPIDS GPU DataFrame provides a pandas-like API that will be familiar to data scientists, so they can now build GPU-accelerated workflows more easily.

146 questions

votes

4 answers

How do I install cudf using pip?

I wanted to accelerate pandas on my GPU so I decided to use cudf library. Please do suggest other libraries(if any). I tried to install cudf using pip by pip3.6 install cudf-cuda92. The pip version is 19.2.3(latest). When I run pip3.6 install…

pip python-3.6 cudf

asked Sep 12 '19 at 10:10

rahul_5409

votes

2 answers

How to do a matrix dot product in the GPU with rapids.ai

I'm using CUDF it's part of the rapids ML suite from Nvidia. Using this suite how would I do a dot product? df = cudf.DataFrame([('a', list(range(20))), ('b', list(reversed(range(20)))), ('c', list(range(20)))]) e.g. how would I perform a dot…

python nvidia rapids cudf

asked Feb 01 '19 at 13:20

Pablojim

8,542
8
45
69

votes

0 answers

How to convert a cudf.core.dataframe.DataFrame into a pandas.DataFrame?

I have a cudf dataframe type(pred) > cudf.core.dataframe.DataFrame print(pred) > action 1778378 0 1778379 1 1778381 1 1778383 0 1778384 0 ... ... 2390444 0 2390446 0 2390478 0 2390481 …

cudf

asked Dec 01 '20 at 18:38

Soerendip

7,684
15
61
128

votes

0 answers

GPU based combinatoric resolver with table group by operations

Given a table with many columns |-------|-------|-------|-------| | A | B | .. | N | |-------|-------|-------|-------| | 1 | 0 | .. | X | | 2 | 0 | .. | Y | | .. | .. | .. | .. …

python pandas gpu pandas-groupby cudf

asked May 23 '20 at 16:33

Reacher234

votes

2 answers

Recommended cudf Dataframe Construction

I'm interested in recommended and fast ways of creating cudf DataFrames from dense numpy objects. I have seen many examples of splitting out columns of a 2d numpy matrix to tuples then calling cudf.DataFrame on a list of tuples -- this is rather…

python numpy rapids cudf

asked Apr 30 '19 at 13:37

quasiben

1,444
1
11
19

votes

1 answer

Replace integers with np.NaN in cudf dataframe

I have a dataframe like this df_a = cudf.DataFrame() df_a['key'] = [0, 1, 2, 3, 4] df_a['values'] = [1,2,np.nan,3,np.nan] and I would like to replace all 2s with np.nan usually in pandas dataframe I would use df_a[df_a==2]=np.nan but in cudf…

python pandas dataframe nan cudf

asked Jul 22 '22 at 10:15

paka

votes

1 answer

Why is polars called the fastest dataframe library, isn't dask with cudf more powerfull?

Most of the benchmarks have dask and cuDF isolated, but i can use them together. Wouldn't Dask with cuDF be faster than polars?! Also, Polars only runs if the data fits in memory, but this isn't the case with dask. So why is there…

python dataframe dask python-polars cudf

asked Jun 15 '22 at 20:13

zacko

votes

3 answers

install cudf on databricks

I am trying to use cudf on databricks. I started following https://medium.com/rapids-ai/rapids-can-now-be-accessed-on-databricks-unified-analytics-platform-666e42284bd1. But the init script link is broken. Then, I followed this link…

databricks cudf

asked Oct 23 '20 at 15:07

Etienne Herlaut

votes

1 answer

Rolling linear regression for use with groupby operation on a cuDF dataframe

I would like to calculate the rolling slope of y_value over x_value using cuML LinearRegression. Sample data (cuDF dataframe): | date | x_value | y_value | | ------ | ------ | ---- | | 2020-01-01 | 900 | 10 | | 2020-01-01 |…

numba numba-pro rapids cudf

asked Aug 02 '20 at 17:15

nasiha

votes

4 answers

In-memory database optimized for read (low/no writes) when operations involve sorting, aggregating, and filtering on any column

I am looking to load ~10GB of data into memory and perform SQL on it in the form of: Sort on a single column (any column) Aggregate on a single column (any column) Filter on a single column (any column) What might be a good choice for performance?…

sql-server oracle in-memory-database timesten cudf

asked Jul 20 '20 at 03:28

David542

104,438
178
489
842

votes

1 answer

What is the relationship between BlazingSQL and dask?

I'm trying to understand if BlazingSQL is a competitor or complementary to dask. I have some medium-sized data (10-50GB) saved as parquet files on Azure blob storage. IIUC I can query, join, aggregate, groupby with BlazingSQL using SQL syntax, but I…

gpu dask parquet cudf

asked Jan 18 '20 at 03:09

Dave Hirschfeld

votes

1 answer

How do you determine memory stats while using rapids.ai?

I'm using python libraries of rapids.ai and one of the key things I'm starting to wonder is: how do I inspect memory allocation programatically? I know I can use nvidia-smi to look at some overall high level stats, but specifically I woud like to…

rapids cudf

asked Jan 04 '20 at 14:48

Robert

1,220
16
19

votes

1 answer

How to read a single large parquet file into multiple partitions using dask/dask-cudf?

I am trying to read a single large parquet file (size > gpu_size), using dask_cudf/dask but it is currently reading it into a single partition, which i am guessing is the expected behavior inferring from the doc-string:…

dask cudf

asked Oct 17 '19 at 16:35

Vibhu Jawa

votes

1 answer

Running RAPIDS without GPU for development?

Is there a way to run RAPIDS without a GPU? I usually develop on a small local machine without a GPU, then push my code to a powerful remote server for real use. Things like TensorFlow allow switching between the CPU and GPU depending on if they're…

gpu rapids cudf

asked Sep 05 '19 at 22:27

golmschenk

11,736
20
78
137

votes

0 answers

how to convert 'dask_cudf' column to datetime?

How can we convert a dask_cudf column of string or nanoseconds to a datetime object? to_datetime is available in pandas and cudf. See sample data below import pandas import cudf # with pandas df = pandas.DataFrame( {'city' :…

python dask rapids cudf

asked Apr 23 '23 at 02:10

dleal

2,244
6
27
49

2 3

…

9 10 Next