Questions tagged [py-datatable]

Use this tag for questions related to the `datatable` python library. Consider tagging your questions with [python] as well. Do not use this tag to ask questions about generic "tables of data".

Datatable is a python library for manipulating two-dimensional data tables (called Frames). It is similar in spirit to python pandas and R data.table.

108 questions
8
votes
1 answer

Filter rows of python datatable based on whether it is in a list

I am new to working with python datatables and here is the tutorial I am following How do I filter out the rows where the values in a certain column are contained in a list? Essentially this is the code I am working with: import datatable as dt …
Zanam
  • 4,607
  • 13
  • 67
  • 143
8
votes
4 answers

py-datatable 'in' operator?

I am unable to perform a standard in operation with a pre-defined list of items. I am looking to do something like this: # Construct a simple example frame from datatable import * df = Frame(V1=['A','B','C','D'], V2=[1,2,3,4]) # Filter frame to a…
Dale Kube
  • 1,400
  • 13
  • 24
5
votes
3 answers

Any simpler way to assign multiple columns in Python like R data.table :=

I'm wondering if there's any simpler way to assign multiple columns in Python, just like the := in R data.table. For example, in Python I would have to write like this: df['Col_A'] = df.A/df.B df['Col_B'] = df.C/df.D df['Col_C'] = df.E/df.F *…
5
votes
2 answers

Error on installing new datatable version 1.0.0 in google colab machine

I can see that a new version of datatable just been released today, and i'm trying to install it on one of google colab machine. !pip install datatable it shows the error as follows: Collecting datatable Using cached…
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
5
votes
1 answer

Top N rows by group using python datatable

What is the proper way to query top N rows by group in python datatable? For example to get top 2 rows having largest v3 value by id2, id4 group I would do pandas expression in the following way: df.sort_values('v3',…
jangorecki
  • 16,384
  • 4
  • 79
  • 160
4
votes
1 answer

How to convert pandas.DataFrame to datatable.Frame containing Int32 (nullable integer)?

I have a pandas.DataFrame containing pandas nullable integer data type and want to convert it to an equivalent datatable.Frame object. However it seems it is not directly possible. What is the best way of doing the conversion without breaking stuff?…
Hyperplane
  • 1,422
  • 1
  • 14
  • 28
4
votes
1 answer

Updating or adding multiple columns with pydatatable in style of R datable's .SDcols

Given iris data I'd like to add new columns corresponding to all numeric columns found. I can do by explicitly listing each numeric column: from datatable import fread, f, mean, update iris_dt =…
topchef
  • 19,091
  • 9
  • 63
  • 102
4
votes
1 answer

How to filter observations for the multiple values passed in the I expression of Pydatatable frame?

I have a data frame with two columns as shown below, DT_EX = dt.Frame({'film':['Don','Warriors','Dragon','Chicago','Lion','Don','Chicago','Warriors'], 'gross':[400,500,600,100,200,300,900,1000]}) Here in first case i would like to…
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
4
votes
1 answer

How to count the number of instances for each category using group by in pydatadable

I have a dataframe as showed below, and here i wanted to apply group by and count operations on it get the count of each category in a pydatatable way?. here is a sample dt contains the different programming languages prog_lang_dt =…
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30
4
votes
1 answer

C++ compiler error while trying to install python datatable

I am trying to install 'datatable' for python using 'pip' as below, but I am getting error. pip install datatable Error shown is: Find an LLVM installation Environment variable LLVM is not set Environment variable LLVM7 is not set …
RKh
  • 13,818
  • 46
  • 152
  • 265
4
votes
2 answers

Is there a way of performing arithmetic operations on entire Frame in Python datatable?

This question is about the recent h2o datatable package. I want to replace pandas code with this library to enhance performance. The question is simple: I need to divide/sum/multiply/substract an entire Frame or various selected columns by a…
carrasco
  • 176
  • 11
4
votes
1 answer

How can I install data.table on my fedora distro

Looking at this github I'd like to be able to install datatable. I do run a fedora 26 distro that runs python3.6 statquant  ~  python3 Python 3.6.4 (default, Mar 13 2018, 18:16:01) [GCC 7.3.1 20180130 (Red Hat 7.3.1-2)] on linux Type "help",…
statquant
  • 13,672
  • 21
  • 91
  • 162
3
votes
2 answers

create row number by group, using python datatable

If I have a python datatable like this: from datatable import f, dt data = dt.Frame(grp=["a","a","b","b","b","b","c"], value=[2,3,1,2,5,9,2]) how do I create an new column that has the row number, by group?. That is, what is the equivalent of R…
langtang
  • 22,248
  • 1
  • 12
  • 27
3
votes
1 answer

datatable filter by not in

Trying to subset a datatable by values that don't match a list: DT1 = dt.Frame(A = ['a', 'b', 'c', 'd']) sel_rows = functools.reduce(operator.or_,(f.A != obs for obs in ['a', 'b'])) DT1[sel_rows, :] However this returns all the rows, I'd expect…
Rafael
  • 3,096
  • 1
  • 23
  • 61
3
votes
1 answer

How to fill null values in python datatable?

Pandas library has a really good function call .fillna() which can be used to fill null values df = df.fillna(0) I am using Datatable Library for my new assignment because it is very fast to load and work with huge data in Datatable. Does such a…
Maunish Dave
  • 359
  • 4
  • 9
1
2 3 4 5 6 7 8