0

I am trying to match a word from a table's columns by iterating a list which contains the column names. columnMap -> Map which contains column data in the form of: key->locations and

value->period:int|name:String|uniform_period:String|database:String|amount:Double|period_num:Int

Out of these columns, I have a hive partition column: period which is present in another List and in order to make sure that my partition column is present in the table columns, I tried using contains as below.

  val columnMap = Map[String, String]()
  def partitionDataTypes(hiveTab:String, prtn_String_columns:String):String = {
    val pcols = prtn_String_columns.split(",").toSeq
    var pList = scala.collection.mutable.TreeSet[String]()
    columnMap.foreach {
      case (k, v) => if (columnMap.contains(hiveTab)) {
        var cols = columnMap(hiveTab).split("\\|")
        for (c <- cols) {
          for (p <- pcols) {
            if(c.contains(p)) pList ++ c
          }
        }
      }
    }
    println("Partition Columns: " + pList.toString())
    pList.toString()
  }

If the input list: pcols contains: period, the contains function is resulting in the output values of: period, uniform_period & period_num whereas I only have to get exact match "period". How do I form a regex pattern to match the columnMap data by matching each element of it which starts with element from pcols and ends before : as given in the loop above.

Metadata
  • 2,127
  • 9
  • 56
  • 127
  • 2
    What if you use `.startsWith`? – Wiktor Stribiżew Jun 19 '19 at 10:18
  • If I used .startsWith, then I get period & period_num because "period_num" starts with "period" – Metadata Jun 19 '19 at 10:31
  • 1
    And what do you need? You say you want "*to match a string **that begins with** an exact given word*". Try `.equals()` if you need a full string match. – Wiktor Stribiżew Jun 19 '19 at 10:32
  • If the input contains "period", I am trying to apply a regex on the elements of the List: cols to find if there is an exact match. Ex: the list: pcols contains only one element: "period". This should be compared with all the elements in the "columnMap". The columnMap contains data like "value->period:int|name:String|uniform_period:String|database:String|amount:Double|period_num:Int". So I split it into List to match each of its element with 'pcols'. The only element from columnMap that matched pCols is 'period:int'. – Metadata Jun 19 '19 at 10:41
  • So "contains, equals, startsWith" are not the right options to use. Here the column name and datatype are separated by ":" Hence I was looking for a regex pattern that maps each element of "columnMap" with the element of "pcols" before the ":". Hoping I am a bit clear now. – Metadata Jun 19 '19 at 10:44
  • Try [`(?<=[|>])period:[^|]*`](https://regex101.com/r/UKeMaN/1) – Wiktor Stribiżew Jun 19 '19 at 10:49
  • 1
    Note it is not quite clear as your code cannot be used to repro the issue and it is clear you split with a comma and then with a pipe, but you have no commas in your input. – Wiktor Stribiżew Jun 19 '19 at 10:59
  • @WiktorStribiżew I tried the equals() and it worked. Now I think regex is not solution for all string comparison scenarios. – Metadata Jun 22 '19 at 06:27

0 Answers0