1

How to execute a wildcard/RegEx search in Data Catalog (Google Cloud Platform) ?

  • It would make sense to search metadata across column names and tag attributes (and there values).

The current documentation only lists very strict search behavior e.g. for tag:data_gov_template.hasPII(=true)

  • Needed would be a result for "PII" - I don't care about specifying the exact template name etc.

e.g. labels:etl

  • if I only search for etl there is no result

(metadata/attributes and values is not searchable on a direct way?)

InLaw
  • 2,537
  • 2
  • 21
  • 33
  • According to the documentation you shared, you can use name:x , which it will match all the entities which matches the predicate ***x***. So this behaviour it is similar to wildcards. Does it addresses your question? [Here](https://cloud.google.com/data-catalog/docs/concepts/overview#how_works) is an overview of how Data Catalog works. – Alexandre Moraes Nov 10 '20 at 12:17
  • I updated my question with examples. You are right that predicate "x" is very broad (and not a controlled and precise search) – InLaw Nov 10 '20 at 13:22
  • e.g. ´column:difference.old_mode´ is not working even it is the exact name of the column – InLaw Nov 10 '20 at 13:25
  • @AlexandreMoraes the docs telling sometimes not much and are sometimes incorrect. Interesting what Google internally they think of the current state of Data Catalog (e.g. 22:50) https://www.youtube.com/watch?v=gCXgZ5ZkJeI – InLaw Nov 10 '20 at 13:43
  • After reading your update, in order for ***label:ets*** to work, you data assets should be labeld, such as explained here for BigQuery. Have you labelled the data assets you want to retrieve? ***label:etl*** returns your data assets that have this label and the label key has **etl** as a substring. – Alexandre Moraes Nov 10 '20 at 14:01
  • Regarding your comment that `column:difference.old_mode` does not work, I must point out that, currently, searching for nested columns is not supported in Data Catalog. There is an opened Feature Request to implement this feature, which you can keep track [here](https://b.corp.google.com/issues/164166219). Do you have any other concerns? If not, I will sum up all the information I shared as an answer to further contribute to the community. – Alexandre Moraes Nov 10 '20 at 14:09
  • like written in the question. the search for "etl" should end with results NOT ONLY labels:etl [there is no flexible regex search - that is like IT-kindergarten, isn't is?] AND metdata -> e.g. Tag attributes and there values are not searchable at all ? – InLaw Nov 10 '20 at 14:31
  • As written in the [documentation](https://cloud.google.com/data-catalog/docs/how-to/search-reference#qualified_predicates), you can search through your Data Asset's metadata, based on the described qualifiers, each one has a specific reach. If you use the label qualifier it will go through the metadata, searching for matching sub-strings in the label (for all assets that have a label). Does it address your question? – Alexandre Moraes Nov 10 '20 at 16:12

1 Answers1

1

From your use case, I understood that you want to search for a particular metadata attribute, like a Tag field, PII, right?

For tagged assets

If you don't care about the template name. You could use the tag:x search facet.

So if all your templates, data_gov_template, data_curator_template, data_etl_template, all contain the same Tag field name, has_pii, you can search using:

tag:has_pii and this will return all assets with that metadata attribute, no matter what the template name is.

For columns

You can use the column:x search facet to match a substring of the column name in the schema of the data asset. Which does not support nested columns yet.

For labels

You can use the labels:bar search facet for data assets that have a label (with some value) and the label key has bar as a substring.

You are also able to search on their values. So yes, the metadata/attributes and values are searchable.

But it is not a regex kind, it is a substring match when the search facet uses colon :, like labels:bar or an exact match when the search facet uses equals =, like type=table.

mesmacosta
  • 466
  • 3
  • 10
  • Thanks, but that's the issue. There is no flexible search for attributes only isolated for labels OR tags OR columns OR policies .. https://issuetracker.google.com/issues/172933221 – InLaw Nov 22 '20 at 18:15