Hive External table-CSV File- Header row

Question

Below is the hive table i have created:

CREATE EXTERNAL TABLE Activity (
  column1 type, </br>
  column2 type
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LOCATION '/exttable/';

In my HDFS location /exttable, i have lot of CSV files and each CSV file also contain the header row. When i am doing select queries, the result contains the header row as well.

Is there any way in HIVE where we can ignore the header row or first line ?

Possible duplicate of [this question](http://stackoverflow.com/questions/15751999/hive-external-table-skip-first-row/15753145#15753145). Basically same answer there as here with some more suggestions. @kgu87's solution works too. — Daniel Koverman, May 10 '13 at 14:48
Does this answer your question? [How to skip CSV header in Hive External Table?](https://stackoverflow.com/questions/15751999/how-to-skip-csv-header-in-hive-external-table) — Jacek Laskowski, Jun 29 '21 at 08:39

score 27 · Answer 1 · answered Feb 10 '15 at 11:25

27

you can now skip the header count in hive 0.13.0.


tblproperties ("skip.header.line.count"="1");

answered Feb 10 '15 at 11:25

tariqz

271
3
2

2

Note that this can be done to an existing table as well using ALTER TABLE. See https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTableProperties – Mike Nov 08 '16 at 15:56

score 25 · Answer 2 · edited Apr 22 '14 at 16:19

25

If you are using Hive version 0.13.0 or higher you can specify "skip.header.line.count"="1" in your table properties to remove the header.

For detailed information on the patch see: https://issues.apache.org/jira/browse/HIVE-5795

edited Apr 22 '14 at 16:19

Climbs_lika_Spyder

6,004
3
39
53

answered Apr 17 '14 at 21:39

Varun Gupta

1,419
6
28
53

score 7 · Answer 3 · edited Dec 27 '21 at 14:51

Lets say you want to load csv file like below located at /home/test/que.csv

1,TAP (PORTUGAL),AIRLINE
2,ANSA INTERNATIONAL,AUTO RENTAL
3,CARLTON HOTELS,HOTEL-MOTEL

Now, we need to create a location in HDFS that holds this data.

hadoop fs -put /home/test/que.csv /user/mcc

Next step is to create a table. There are two types of them to choose from. Refer this for choosing one.

Example for External Table.

create external table industry_ 
(
MCC string ,
MCC_Name string,
MCC_Group string
)       
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/user/mcc/'
tblproperties ("skip.header.line.count"="1");

Note: When accessed via Spark SQL, the header row of the CSV will be shown as a data row. Tested on: spark version 2.4.

score 3 · Answer 4 · answered May 09 '13 at 10:34

3

There is not. However, you can pre-process your files to skip the first row before loading into HDFS -

tail -n +2 withfirstrow.csv > withoutfirstrow.csv

Alternatively, you can build it into where clause in HIVE to ignore the first row.

answered May 09 '13 at 10:34

kgu87

2,050
14
12

How do you write a where clause in HIVE to ignore the first row? – KalEl Jan 05 '15 at 15:40
1

If the data contains header row, then you can use where col <> 'header column name' in HQL, assuming the data itself does not contain that value. A better option for Hive 0.13.0 or higher would be to use skip.header.line.count when defining the table. – kgu87 Jan 07 '15 at 13:49
I used the hive tblproperties `("skip.header.line.count"="1");` property. But I am still able to see the header in hdfs table location. How can i remove the header line from being inserted into hdfs – Neethu Lalitha Apr 03 '15 at 15:18

score 0 · Answer 5 · answered Jun 16 '15 at 10:05

0

If your hive version doesn't support tblproperties ("skip.header.line.count"="1"), you can use below unix command to ignore the first line (column header) and then put it in HDFS.

 sed -n '2,$p' File_with_header.csv > File_with_No_header.csv

answered Jun 16 '15 at 10:05

sras

818
7
18

score -2 · Answer 6 · edited Aug 16 '16 at 22:49

-2

To remove the header from the csv file in place use:

sed -i 1d filename.csv

edited Aug 16 '16 at 22:49

Buddy

10,874
5
41
58

answered Aug 16 '16 at 22:43

Noel Reed

1

Hive External table-CSV File- Header row

6 Answers6

Linked