34

I have orc table in hive I want to drop column from this table

ALTER TABLE table_name drop  col_name;

but I am getting the following exception

Error occurred executing hive query: OK FAILED: ParseException line 1:35 mismatched input 'user_id1' expecting PARTITION near 'drop' in drop partition statement

Can any one help me or provide any idea to do this? Note, I am using hive 0.14

sheilak
  • 5,833
  • 7
  • 34
  • 43
Aryan Singh
  • 602
  • 1
  • 8
  • 17

8 Answers8

47

You cannot drop column directly from a table using command ALTER TABLE table_name drop col_name;

The only way to drop column is using replace command. Lets say, I have a table emp with id, name and dept column. I want to drop id column of table emp. So provide all those columns which you want to be the part of table in replace columns clause. Below command will drop id column from emp table.

 ALTER TABLE emp REPLACE COLUMNS( name string, dept string);
Reena Upadhyay
  • 1,977
  • 20
  • 35
  • thanks for reply @reena i am having orc table and i have tried even replace statement but it also didnt worked here – Aryan Singh Dec 14 '15 at 05:40
  • Can you please try it out with 1.2 version of hive. It works for 1.2 hive. – Reena Upadhyay Dec 14 '15 at 05:56
  • 3
    I had the same problem, tried the above to replace columns, and failed, FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Replacing columns cannot drop columns for table sandbox6.alc_ont_oe_order_headers_all. SerDe may be incompatible – ChuckCottrill Jul 11 '17 at 20:28
  • I am having the same issue as well. How can I drop the column from partitioned table? The above command doesn't work for me and I am receiving the same error as well.. – Teja Oct 31 '18 at 22:59
  • I am having issues with a parquet table. Any suggestions there? – Joha Feb 14 '19 at 12:57
  • what if I have hundreds of columns? – Danyal Jul 10 '23 at 22:09
9

There is also a "dumb" way of achieving the end goal, is to create a new table without the column(s) not wanted. Using Hive's regex matching will make this rather easy.

Here is what I would do:

-- make a copy of the old table
ALTER TABLE table RENAME TO table_to_dump;

-- make the new table without the columns to be deleted
CREATE TABLE table AS
SELECT `(col_to_remove_1|col_to_remove_2)?+.+`
FROM table_to_dump;

-- dump the table 
DROP TABLE table_to_dump;

If the table in question is not too big, this should work just well.

ccy
  • 376
  • 4
  • 7
  • not sure why this wouldn't work on a very large table: however it s elegant and easy – parisni Oct 10 '19 at 12:15
  • 1
    To use regex matching in versions of hive later than 0,13 the following property must be set "hive.support.quoted.identifiers=none" – DylWylie Sep 10 '20 at 08:32
  • Important note: When using `CREATE TABLE table AS SELECT` you MUST also specify your original partition scheme and storage format (at least), otherwise the hive default will be used, which is most likely NOT you wanted!! Additional caveat: Specifying partition scheme in CTAS requires Hive 3.2.0+!!! Holy sh*t. – Naitree Oct 23 '20 at 06:20
5

suppose you have an external table viz. organization.employee as: (not including TBLPROPERTIES)

hive> show create table organization.employee;
OK
CREATE EXTERNAL TABLE `organization.employee`(
      `employee_id` bigint,
      `employee_name` string,
      `updated_by` string,
      `updated_date` timestamp)
    ROW FORMAT SERDE
      'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
    STORED AS INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
    OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
    LOCATION
      'hdfs://getnamenode/apps/hive/warehouse/organization.db/employee'

You want to remove updated_by, updated_date columns from the table. Follow these steps:

create a temp table replica of organization.employee as:

hive> create table organization.employee_temp as select * from organization.employee;

drop the main table organization.employee.

hive> drop table organization.employee;

remove the underlying data from HDFS (need to come out of hive shell)

[nameet@ip-80-108-1-111 myfile]$ hadoop fs -rm hdfs://getnamenode/apps/hive/warehouse/organization.db/employee/*

create the table with removed columns as required:

hive> CREATE EXTERNAL TABLE `organization.employee`(
  `employee_id` bigint,
  `employee_name` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://getnamenode/apps/hive/warehouse/organization.db/employee'

insert the original records back into original table.

hive> insert into organization.employee 
select employee_id, employee_name from organization.employee_temp;

finally drop the temp table created

hive> drop table organization.employee_temp;
Nameet Nayan
  • 103
  • 2
  • 11
2
ALTER TABLE emp REPLACE COLUMNS( name string, dept string);

Above statement can only change the schema of a table, not data. A solution of this problem to copy data in a new table.

Insert <New Table> Select <selective columns> from <Old Table> 
Sandeep
  • 1,504
  • 7
  • 22
  • 32
1

ALTER TABLE is not yet supported for non-native tables; i.e. what you get with CREATE TABLE when a STORED BY clause is specified.

check this https://cwiki.apache.org/confluence/display/Hive/StorageHandlers

Sat
  • 73
  • 8
0

After a lot of mistakes, in addition to above explanations, I would add simpler answers.

Case 1: Add new column named new_column

ALTER TABLE schema.table_name
ADD new_column INT COMMENT 'new number column');

Case 2: Rename a column new_column to no_of_days

ALTER TABLE schema.table_name 
CHANGE new_column no_of_days INT;

Note that in renaming, both columns should be of same datatype like above as INT

Hari_pb
  • 7,088
  • 3
  • 45
  • 53
-1

For external table its simple and easy. Just drop the table schema then edit create table schema , at last again create table with new schema. example table: aparup_test.tbl_schema_change and will drop column id steps:-

------------- show create table to fetch schema ------------------

spark.sql("""
show create table aparup_test.tbl_schema_change
""").show(100,False)

o/p:
CREATE EXTERNAL TABLE aparup_test.tbl_schema_change(name STRING, time_details TIMESTAMP, id BIGINT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
)
STORED AS
  INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'gs://aparup_test/tbl_schema_change'
TBLPROPERTIES (
  'parquet.compress' = 'snappy'
)
""")

------------- drop table --------------------------------

spark.sql("""
drop table aparup_test.tbl_schema_change
""").show(100,False)

------------- edit create table schema by dropping column "id"------------------

CREATE EXTERNAL TABLE aparup_test.tbl_schema_change(name STRING, time_details TIMESTAMP)
ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = '1'
)
STORED AS
  INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION 'gs://aparup_test/tbl_schema_change'
TBLPROPERTIES (
  'parquet.compress' = 'snappy'
)
""")

------------- sync up table schema with parquet files ------------------

spark.sql("""
msck repair table aparup_test.tbl_schema_change
""").show(100,False)

==================== DONE =====================================
Oleksandr Savchenko
  • 642
  • 3
  • 10
  • 35
-7

Even below query is working for me.

Alter table tbl_name drop col_name
Kursad Gulseven
  • 1,978
  • 1
  • 24
  • 26