0

I want to assign values to a new column, based on the regex match in another column in python-datatable syntax.

DT[get rows by regex , assign value to new column, ]

import pandas as pd
import datatable as dt
from datatable import f, Frame
import re as re

DT = dt.Frame({'a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat']})
DT['new_col']=DT[:,f.b]
DT['new_col'] = Frame([re.sub('f.*','words starting with f', s) for s in DT[:, "new_col"].to_list()[0]])
DT.head()
DT['new_col'] = Frame([re.sub('c.*','words starting with c', s) for s in DT[:, "new_col"].to_list()[0]])
DT.head()

Is there another solution without converting with "to_list()" and more within the datatable package (without a loop)?

Here the result of the Regex in this question does not allow for operations on a whole column: Python data.table row filter by regex This is for pandas but not datatable: How to filter rows in pandas by regex

Roy2012
  • 11,755
  • 2
  • 22
  • 35
Zappageck
  • 122
  • 9

1 Answers1

1

I think for now you can go with solution. and the required implements will be looked in and added to datatable as it grows up.

Import libraries

import pandas as pd
import datatable as dt
from datatable import f,by
import re as re

Create a DT

DT_X = dt.Frame({'a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat']})

AND do required manipulations as

DT_X[:,f[:].extend({'new_col':dt.Frame([re.sub('f.*','words starting with f', s) for s in DT_X[:, f.b].to_list()[0]])})]

Output:

  |  a  b    new_col              
-- + --  ---  ---------------------
 0 |  1  hi   hi                   
 1 |  2  foo  words starting with f
 2 |  3  fat  words starting with f
 3 |  4  cat  cat
myamulla_ciencia
  • 1,282
  • 1
  • 8
  • 30