8
alter table abc add columns (stats1 map<string,string>, stats2 map<string,string>)

i have altered my table with above query. But after while checking the data i got NULL's for the both extra columns. I'm not getting data.

screenshot

leftjoin
  • 36,950
  • 8
  • 57
  • 116
Veeru Chow
  • 81
  • 1
  • 1
  • 3
  • what is the data you have? how you created the table? give more information to analyze. – Sathiyan S Nov 14 '16 at 08:52
  • Thanks for the reply Sathiyan. table is already created through framework support. alter table abc add columns (stats1 map, stats2 map) In my sample data i have GB:4 and US:5 for those two extra columns i used above query to add columns to that hive table. But in my data i found nulls of both manually added columns. – Veeru Chow Nov 14 '16 at 12:08
  • While create table query you should mention the delimiters of map as ':'. Check how the row format should be for map you will get the answer. – Sathiyan S Nov 14 '16 at 13:15
  • Hi..,, I am trying to RENAME my table name with "" ALTER table stage_ondemandstats RENAME TO stageondemandstats"" But getting error like "" Unable to alter table. Unable to access old location"" Can you tell me what is the solution. – Veeru Chow Nov 15 '16 at 05:45
  • Check whether you(user) have permissions(rwx) to modify the /user/hive/warehouse/stage_ondemandstats/ folder present in HDFS. – Sathiyan S Nov 15 '16 at 08:08

4 Answers4

9

CASCADE is the solution.

Query:

ALTER TABLE dbname.table_name ADD columns (column1 string,column2 string) CASCADE; 

This changes the columns of a table's metadata and cascades the same change to all the partition metadata. RESTRICT is the default, limiting column change only to table metadata.

wandermonk
  • 6,856
  • 6
  • 43
  • 93
Pravat Sutar
  • 131
  • 1
  • 2
8

As others have noted CASCADE will change the metadata for all partitions. Without CASCADE, if you want to change old partitions to include the new columns, you'll need to DROP the old partitions first and then fill them, INSERT OVERWRITE without the DROP won't work, because the metadata won't update to the new default metadata.

Let's say you have already run alter table abc add columns (stats1 map<string,string>, stats2 map<string,string>) without CASCADE by accident and then you INSERT OVERWRITE an old partition without DROPPING first. The data will be stored in the underlying files, but if you query that table from hive for that partition, it won't show because the metadata wasn't updated. This can be fixed without having to rerun the insert overwrite using the following:

  1. Run SHOW CREATE TABLE dbname.tblname and copy all the column definitions that existed before adding new columns
  2. Run ALTER TABLE dbname.tblname REPLACE COLUMNS ({paste in col defs besides columns to add here}) CASCADE
  3. Run ALTER TABLE dbname.tblname ADD COLUMNS (newcol1 int COMMENT "new col") CASCADE
  4. be happy that the metadata has been changed for all partitions =)

As an example of steps 2-3:

DROP TABLE IF EXISTS  junk.testcascade ;
CREATE TABLE junk.testcascade (
startcol INT
)
partitioned by (d int)
stored as parquet
;
INSERT INTO TABLE junk.testcascade PARTITION(d=1)
VALUES
    (1),
    (2)
;

INSERT INTO TABLE junk.testcascade PARTITION(d=2)
VALUES
    (1),
    (2)
;

SELECT * FROM junk.testcascade ;
+-----------------------+----------------+--+
| testcascade.startcol  | testcascade.d  |
+-----------------------+----------------+--+
| 1                     | 1              |
| 2                     | 1              |
| 1                     | 2              |
| 2                     | 2              |
+-----------------------+----------------+--+

 --no cascade! opps
ALTER TABLE junk.testcascade ADD COLUMNS( testcol1 int, testcol2 int) ;

INSERT OVERWRITE TABLE junk.testcascade PARTITION(d=3)
VALUES
    (1,1,1),
    (2,1,1)
;

INSERT OVERWRITE TABLE junk.testcascade PARTITION(d=2)
VALUES
    (1,1,1),
    (2,1,1)
;

--okay! because we created this table after altering the metadata
select * FROM junk.testcascade where d=3;
+-----------------------+-----------------------+-----------------------+----------------+--+
| testcascade.startcol  | testcascade.testcol1  | testcascade.testcol2  | testcascade.d  |
+-----------------------+-----------------------+-----------------------+----------------+--+
| 1                     | 1                     | 1                     | 3              |
| 2                     | 1                     | 1                     | 3              |
+-----------------------+-----------------------+-----------------------+----------------+--+

--not okay even tho we inserted =( because the metadata isnt changed
select * FROM junk.testcascade where d=2;
+-----------------------+-----------------------+-----------------------+----------------+--+
| testcascade.startcol  | testcascade.testcol1  | testcascade.testcol2  | testcascade.d  |
+-----------------------+-----------------------+-----------------------+----------------+--+
| 1                     | NULL                  | NULL                  | 2              |
| 2                     | NULL                  | NULL                  | 2              |
+-----------------------+-----------------------+-----------------------+----------------+--+

--cut back to original columns
ALTER TABLE junk.testcascade REPLACE COLUMNS( startcol int) CASCADE;

--add
ALTER table junk.testcascade ADD COLUMNS( testcol1 int, testcol2 int) CASCADE;

--it works!
select * FROM junk.testcascade where d=2; 
+-----------------------+-----------------------+-----------------------+----------------+--+
| testcascade.startcol  | testcascade.testcol1  | testcascade.testcol2  | testcascade.d  |
+-----------------------+-----------------------+-----------------------+----------------+--+
| 1                     | 1                     | 1                     | 2              |
| 2                     | 1                     | 1                     | 2              |
+-----------------------+-----------------------+-----------------------+----------------+--+
DVL
  • 354
  • 3
  • 9
5

To add columns into partitioned table you need to recreate partitions. Suppose the table is external and the datafiles already contain new columns, do the following: 1. Alter table add columns... 2. Recreate partitions. For each partitions do Drop then create. Newly created partition schema will inherit the table schema.

Alternatively you can drop the table then create table and create all partitions or restore them simply running MSCK REPAIR TABLE abc command. The equivalent command on Amazon Elastic MapReduce (EMR)'s version of Hive is: ALTER TABLE table_name RECOVER PARTITIONS. See manual here: RECOVER PARTITIONS

Also in Hive 1.1.0 and later you can use CASCADE option of ALTER TABLE ADD|REPLACE COLUMNS. See manual here: ADD COLUMN

These suggestions work for external tables.

leftjoin
  • 36,950
  • 8
  • 57
  • 116
1

This solution only works if your data is partitioned and you know the location of the latest partition. In this case instead of doing a recover partition or a repair which is a costly operation, you can do something like:

  • Read the partitioned table and get the schema details
  • Read the table you want to update
  • Now find which all columns are different and do a alter table for each

Posting a scala code for reference:

def updateMetastoreColumns(spark: SparkSession, partitionedTablePath: String, toUpdateTableName: String): Unit = {
    //fetch all column names along with their corresponding datatypes from latest partition
    val partitionedTable = spark.read.orc(partitionedTablePath)
    val partitionedTableColumns = partitionedTable.columns zip partitionedTable.schema.map(_.dataType.catalogString)

    //fetch all column names along with their corresponding datatypes from currentTable
    val toUpdateTable = spark.read.table(toUpdateTableName)
    val toUpdateTableColumns = toUpdateTable.columns zip toUpdateTable.schema.map(_.dataType.catalogString)

    //check if new columns are present in newer partition
    val diffColumns = partitionedTableColumns.diff(toUpdateTableColumns)

    //update the metastore with new column info
    diffColumns.foreach {column: (String, String) => {
      spark.sql(s"ALTER TABLE ${toUpdateTableName} ADD COLUMNS (${column._1} ${column._2})")
    }}
  }

This will help you dynamically find latest columns which are added to newer partition and update it to your metastore on the fly.

Eapen Jose
  • 71
  • 4