How to write regex in Knime Unpivoting node to select columns

Question

I am attempting to unpivot COVID-19 data in Knime with the Unpivoting Node. The data available from Johns Hopkins at https://github.com/CSSEGISandData/COVID-19 is wide format where each new day of data is added as a new column.

I can manually make the columns with daily data be rows with the Unpivoting Node. However, each day I must reconfigure the node to account for the new column. There are 5 unpivoting nodes in my workflow where this must be done.

The Unpivoting Node has an option to use Regex to detect the columns to include or exclude but I am unable to make it work.

The available columns to include/exclude are a handful of field names such as Province/State, Country/Region, Lat, Long, plus the long list of date columns of the format m/d/yy (or m/dd/yy if later in the month). The Johns Hopkins data for the US is similar format but with additional columns for counties, iso codes, etc.

All of the date columns are this year (i.e. 2020).

For the top part of the Unpivoting node where Value Columns are specified, I can do what I need by using the Wildcard setting and the pattern */*/20
For the bottom part of the Unpivoting node, I need a wildcard or Regex expression to specify all the other columns.

All the other columns include alphabet characters. None are of the format m/d/yy. Therefore, some sort of Regex that includes any column with alphabetical column names, or specifies NOT m/d/yy should do the trick.

I tried using [\s\S]+ for help writing the Regex but nothing seems to work. I appreciate any help.

Are you sure you are not using the [`Unpivoting`](https://nodepit.com/node/org.knime.base.node.preproc.unpivot2.Unpivot2NodeFactory) node instead of the [`Unpivot`](https://github.com/aborg0/hits/blob/master/com.mind_era.knime.util/html/nodes/Unpivot.html) node from HiTS? (Maybe there are other Unpivot nodes I am not aware of, but yours look like Unpivoting.) — Gábor Bakos, Apr 21 '20 at 17:34
@Gabor Bakos... Yes, you are correct. It is the Unpivoting Node. I just edited my question to correct this. — DanV, Apr 21 '20 at 19:36

score 1 · Answer 1 · answered Apr 22 '20 at 14:16

1

If other column names don't have / you can use [^/]+. Check here for more explanation.

answered Apr 22 '20 at 14:16

ipazin

102
7

@ ipazin - Thank you. Would you expand your answer to show how I might exclude a substring? Some other columns include a / . I suppose I could rename these columns to remove the / , but if there is a way to exclude the substring "/20" it would be simpler. – DanV Apr 23 '20 at 14:16
@ ipizan - The answer provided by Gabor solved my problem in a non-regex way. I would still appreciate knowing how to make regex work if you have a suggestion for regex to exclude a column name that ends with "/20". :) – DanV Apr 24 '20 at 12:57
@DanV this one should do it: `(?!.+/20$).*` – ipazin Apr 24 '20 at 15:32
1

@ipizan - That worked. Thank you. I clicked upvote but I'm new to Stack Overflow and it says it recorded the vote but won't display it. – DanV Apr 26 '20 at 12:30

Gábor Bakos · Answer 2 · 2020-04-23T19:00:53.327

1

I think it might be easy to select the other columns manually in the Retained columns section. (That way you can easily remove some of them if you want to.) I assume the date columns are in a single group, so you can click on the first column to retain, scroll down to the first date column you do not want to retain, Shift+click on the previous column, include those, scroll to the column after the dates columns, and do similar. Please use the Enforce inclusion option to not generate warnings/errors when the new columns added. Example:

This way you can later easily remove columns from the retained.

PS: On your screenshot it seems you forgot to include the + from the end of the expression.

edited Apr 23 '20 at 19:00

answered Apr 22 '20 at 17:44

Gábor Bakos

8,982
52
35
52

@ Gabor Bakos - I'm trying to make the workflow run automatically without having to manually go into 5 Unpivoting nodes every day. Thus the attempt at Regex. Adding the + at the end of the expression almost worked. There are a couple of other column names that include a / so unfortunately they were also excluded. :( – DanV Apr 23 '20 at 14:12
Added a note about `Enforce inclusion`. Once you selected the required columns to retain, you can click on that option, so the new columns will automatically be added to the excluded list. – Gábor Bakos Apr 23 '20 at 19:02
@ Gabor Bakos - Enforce inclusion solved the problem. Thanks! When I tested it this morning, the new date appeared in the Exclude column as desired. I was previously misinterpreting the words "Enforce Inclusion" and "Enforce Exclusion" backwards from how the node works. I clicked upvote but Stack Overflow says I'm too new to the site to display my upvote. – DanV Apr 24 '20 at 12:53

How to write regex in Knime Unpivoting node to select columns

2 Answers2