Python Datatable/Pydatatable: How to filter rows in datatable by regex and assign value to new variable according to filter

Question

I want to assign values to a new column, based on the regex match in another column in python-datatable syntax.

DT[get rows by regex , assign value to new column, ]

import pandas as pd
import datatable as dt
from datatable import f, Frame
import re as re

DT = dt.Frame({'a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat']})
DT['new_col']=DT[:,f.b]
DT['new_col'] = Frame([re.sub('f.*','words starting with f', s) for s in DT[:, "new_col"].to_list()[0]])
DT.head()
DT['new_col'] = Frame([re.sub('c.*','words starting with c', s) for s in DT[:, "new_col"].to_list()[0]])
DT.head()

Is there another solution without converting with "to_list()" and more within the datatable package (without a loop)?

Here the result of the Regex in this question does not allow for operations on a whole column: Python data.table row filter by regex This is for pandas but not datatable: How to filter rows in pandas by regex

score 1 · Answer 1 · answered Jun 19 '20 at 09:22

I think for now you can go with solution. and the required implements will be looked in and added to datatable as it grows up.

Import libraries

import pandas as pd
import datatable as dt
from datatable import f,by
import re as re

Create a DT

DT_X = dt.Frame({'a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat']})

AND do required manipulations as

DT_X[:,f[:].extend({'new_col':dt.Frame([re.sub('f.*','words starting with f', s) for s in DT_X[:, f.b].to_list()[0]])})]

Output:

  |  a  b    new_col              
-- + --  ---  ---------------------
 0 |  1  hi   hi                   
 1 |  2  foo  words starting with f
 2 |  3  fat  words starting with f
 3 |  4  cat  cat

Python Datatable/Pydatatable: How to filter rows in datatable by regex and assign value to new variable according to filter

1 Answers1