4

I'm interested in recommended and fast ways of creating cudf DataFrames from dense numpy objects. I have seen many examples of splitting out columns of a 2d numpy matrix to tuples then calling cudf.DataFrame on a list of tuples -- this is rather expensive. Using numba.cuda.to_device is quite fast. Is it possible to use numba.cuda.to_device or is there a more efficient way of constructing the DataFrame ?

In [1]: import cudf

In [2]: import numba.cuda

In [3]: import numpy as np

In [4]: data = np.random.random((300,100))

In [5]: data.nbytes
Out[5]: 240000

In [6]: %time numba.cuda.to_device(data)
CPU times: user 8 ms, sys: 0 ns, total: 8 ms
Wall time: 4.45 ms
Out[6]: <numba.cuda.cudadrv.devicearray.DeviceNDArray at 0x7f8954f84550>

In [7]: record_data = (('fea%d'%i, data[:,i]) for i in range(data.shape[1]))

In [8]: %time cudf.DataFrame(record_data)
CPU times: user 960 ms, sys: 508 ms, total: 1.47 s
Wall time: 1.61 s
Out[8]: <cudf.DataFrame ncols=100 nrows=300 >

The above shows cudf.DataFrame ~360x slower than a direct call to numba.cuda.to_device

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
quasiben
  • 1,444
  • 1
  • 11
  • 19
  • I ran your code on a GTX 1080ti using jupyter from the official RAPIDS docker image `rapidsai/rapidsai:cuda9.2-runtime-ubuntu16.04`. `numba.cuda.to_device(...)`: `CPU times: user 1.88 ms, sys: 13.2 ms, total: 15 ms`, `cudf.Dataframe(...)`: `CPU times: user 10 µs, sys: 4 µs, total: 14 µs` – cwharris May 20 '19 at 13:14
  • First, you should file an issue. I don't think `cuda.DataFrame` should take so long, even if it is creating hundreds of columns. Second, have you tried combining `cudf.from_dlpack` and `cupy.to_dlpack`? – harrism May 27 '19 at 11:59

2 Answers2

1

cudf.DataFrame is a dedicated columnar format and performs best with data that is very tall instead of wide. However, we have some important zero-copy functions that allow you to move data between numba/cupy/cudf inexpensively. At this point in time, as far as I know, the best way to get a raw numpy matrix into cudf is using the to_device method as you identified, followed by from_gpu_matrix in cudf.

import cudf
import numba.cuda
import numpy as np
data = np.random.random((300, 100))
%time gpu = numba.cuda.to_device(data)
%time df = cudf.DataFrame.from_gpu_matrix(gpu, columns = ['fea%d'%i for i in range(data.shape[1])])

Out:

CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 872 µs
CPU times: user 180 ms, sys: 0 ns, total: 180 ms
Wall time: 186 ms

The 186ms in creating the cudf.DataFrame is the minimum creation time, and is overhead primarily for host side management of columnar memory and metadata.

Thomson Comer
  • 3,919
  • 3
  • 30
  • 32
1

Please, let me mention that cudf.DataFrame.from_gpu_matrix() method has been deprecated since RAPIDS 0.17.

Nowaday, cudf.DataFrame() accepts Numba DeviceNDArrays as input data.

import cudf
import numba as nb

# Convert a Numba DeviceNDArray to a cuDF DataFrame
src = nb.cuda.to_device([[1, 2], [3, 4]])
dst = cudf.DataFrame(src)

print(type(dst), "\n", dst)