2

I have a hive table like this:

CREATE TABLE `abtestmsg_orc`(
  `eventname` string COMMENT 'AB测试方案上报事件:ABTest', 
  `eventtime` string COMMENT '事件上报时间', 
  `sessionid` string COMMENT 'Session标识', 
  `appkey` string COMMENT 'app标识', 
  `deviceid` string COMMENT 'device标识', 
  `content` string COMMENT 'AB测试方案的内容,格式是一个 json 字符串', 
  `item` array<struct<key:string,value:string>> COMMENT '扩展字段')
PARTITIONED BY ( 
  `dt` string COMMENT '??')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
  'hdfs://hdfsmaster/hive/connector_transfer/abtestmsg_orc'

Now , I want to change the file format from ORC to parquet and change the location to other hdfs directory which contains the parquet files.So I at first I try to change the format:

alter table abtestmsg_orc1 set fileformat parquet;

but it's a pitty that it throws an exception:

FAILED: Execution Error, return code 1 from

org.apache.hadoop.hive.ql.exec.DDLTask. Changing file format (from ORC) is not supported for table connector_transfer.abtestmsg_orc1
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Changing file format (from ORC) is not supported for table connector_transfer.abtestmsg_orc1 (state=08S01,code=1)

I guess this exception means that when I change the fileformat , not only does hive change the table metadata, but also try to change all the data format from orc to parquet.But from the official doc , it says:

ALTER TABLE table_name [PARTITION partition_spec] SET FILEFORMAT file_format;

**This statement changes the table's (or partition's) file format. For available file_format options, see the section above on CREATE TABLE. The operation only changes the table metadata. Any conversion of existing data must be done outside of Hive.**

But what I want to achive is to make its location to a parquet directory.

So , what can I do to achive this ?

wuchang
  • 3,003
  • 8
  • 42
  • 66

3 Answers3

2

You need to create another table abtestmsg_parquet which is STORED AS PARQUET and with the location for your desired parquet files:

CREATE TABLE abtestmsg_parquet(
  eventname string COMMENT 'AB测试方案上报事件:ABTest', 
  eventtime string COMMENT '事件上报时间', 
  sessionid string COMMENT 'Session标识', 
  appkey string COMMENT 'app标识', 
  deviceid string COMMENT 'device标识', 
  content string COMMENT 'AB测试方案的内容,格式是一个 json 字符串', 
  item array<struct<key:string,value:string>> COMMENT '扩展字段')
PARTITIONED BY ( 
  dt string COMMENT '??')
STORED AS PARQUET
LOCATION
  'hdfs://hdfsmaster/hive/connector_transfer/abtestmsg_parquet/'

Then u can create the files with dynamic partitioning. run:

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

INSERT INTO TABLE abtestmsg_parquet PARTITION(dt)

SELECT eventname, eventtime, sessionid, appkey, deviceid, content, item, dt
FROM abtestmsg_orc;
mrsrinivas
  • 34,112
  • 13
  • 125
  • 125
belostoky
  • 934
  • 2
  • 11
  • 22
1

In my situation, if I mean to change format to ORC, the command to change fileformat is:

"alert table myTableName set FILEFORMAT INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'" SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde';

that is, INPUTFORMAT, OUTPUTFORMAT and SERDE should be passed correspondingly.

mashroom
  • 13
  • 6
0

How to Update/Drop a Hive Partition? is a similar article, where I found that in order to change the fileformat I needed to do use <schema> before running the alter table command, even if the table name includes the schema... That said, this won't work for spark.sql.

But if your schema is myschema, you can do

hive> use myschema;
hive> ALTER TABLE myschema.abtestmsg_orc PARTITION(dt='<dt_to_migrate>') SET FILEFORMAT PARQUET 

and this will work, but it will not work without the use command.

Marcus
  • 2,128
  • 20
  • 22