Python data.table row filter by regex

Question

What is the data.table for python equivalent of %like%?

Short example:

dt_foo_bar = dt.Frame({"n": [1, 3], "s": ["foo", "bar"]})  
dt_foo_bar[re.match("foo",f.s),:] #works to filter by "foo"

I had expected something like this to work:

dt_foo_bar[re.match("fo",f.s),:]

But it returns "expected string or bytes-like object". I'd love to start using the new data.tables package in Python the way I use it in R but I work a lot more with text data than numeric.

Thanks in advance.

Pasha · Answer 1 · 2019-08-05T19:03:18.437

5

Since version 0.9.0, datatable contains function .re_match() which performs regular expression filtering. For example:

>>> import datatable as dt
>>> dt_foo_bar = dt.Frame(N=[1, 3, 5], S=["foo", "bar", "fox"])
>>> dt_foo_bar[dt.f.S.re_match("fo."), :]
     N  S  
--  --  ---
 0   1  foo
 1   5  fox

[2 rows x 2 columns]

In general, .re_match() applies to a column expression and produces a new boolean column indicating whether each value matches the given regular expression or not.

edited Aug 05 '19 at 19:03

answered Mar 06 '19 at 20:13

Pasha

6,298
2
22
34

I could not find this feature, or any string related data processing in the documentation. – sammywemmy Jun 13 '20 at 08:38
1

Do you also know how to generate a new column based on the regex? I use this at the moment, but it doesn't look like the best way wit the to_list conversion: DT['new_name'] = Frame([re.sub('some_regex_pattern','value_for_new_column', s) for s in DT[:, "column_for_regex"].to_list()[0]]) – Zappageck Jun 15 '20 at 11:33
@Zappageck I'm afraid that's not possible right now – Pasha Jun 16 '20 at 22:01

Python data.table row filter by regex

1 Answers1

Linked