Exporting Hive Query to a CSV

Question

I'm attempting to export a query from Hive to a local CSV. This same error persists, any help would be greatly appreciated!

Command:

insert overwrite local directory '/path/to/directory' 
row format delimited fields terminated by ',' select * from table limit 10;

Error:

errorMessage:org.apache.spark.sql.catalyst.parser.ParseException:
`missing 'TABLE' at 'local'

Thank you in advance.

Did you try running this via TEZ/MR engine. Your syntax looks correct: https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintothefilesystemfromqueries — Ambrish, Dec 26 '17 at 03:04

ML_Passion · Answer 1 · 2018-08-21T19:07:36.050

This is how I have done it in Hive 1.2 (MapR distribution) and it has worked perfectly for me, in 2 steps:

Step1: Do it while in the hive repl or using hive -e option at the command line:

INSERT OVERWRITE LOCAL DIRECTORY '/path/to/directory/'
          ROW FORMAT DELIMITED
          FIELDS TERMINATED BY ','
          LINES TERMINATED BY '\n'
          STORED AS TEXTFILE
          select * from table1  limit 1000 ;

Step2: At the command line move this file to the required directory using hadoop fs command.

hadoop fs -cat /path/to/directory/*  >  /path/to/directory2/table1_small.csv

or you could even do this:

hadoop fs -text /path/to/directory/*  >  /path/to/directory2/table1_small.csv

In the above example, directory and directory2 are different locations. Let me know if you have any questions.

simbo1905 · Answer 2 · 2020-07-21T03:29:13.860

Hive can store table data as CSV in HDFS using OpenCSVSerde. You can create a table over hdfs folder where you want the CSV file to appear:

CREATE EXTERNAL TABLE `csv_export`(
 wf_id string,
 file_name string,
 row_count int
)
COMMENT 'output table'
ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
  'escapeChar'='\\',
  'quoteChar'='\"',
  'separatorChar'=',')
STORED AS INPUTFORMAT
  'org.apache.hadoop.mapred.TextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  '/user/admin/export/folder';

Then you can query data into using any select you want:

INSERT INTO csv_export SELECT a, b, c FROM source_table;

The INSERT INTO SELECT statement could be a sophisticate join that is looking for data quality issues. The resulting output would be the bad records.

The OpenCSVSerde will create a one more more files for the rows inserted within the same statement:

hdfs dfs -ls /user/admin/export/folder
Found 2 items
-rwx------   3 admin admin        885 2020-04-29 05:24 /user/admin/export/folder/000000_0

You can download a single file to disk and rename it:

hdfs dfs -copyToLocal /user/admin/export/folder/000000_0
mv 000000_0 output.cvs

If there is a lot of data you will get more than one file so it is safer to cat all the files in the folder into the local file

hdfs dfs -cat '/user/admin/export/folder/*' > output.cvs

The resulting output.csv will be properly formatted CSV data.

score 0 · Answer 3 · answered Dec 26 '17 at 17:37

0

You can execute a HIVE query from the command line and pipe the output to a file.

hive -e "select * from table limit 10" > /path/to/directory/file.tsv

Unfortunately, this will give you a tab-separated-value file. I haven't found a way to change the delimiter, but you can post process the file.

answered Dec 26 '17 at 17:37

pault

41,343
15
107
149

See these posts for more info: [Hive query output delimiter](https://stackoverflow.com/questions/13702476/hive-query-output-delimiter) and [How do I output the results of a HiveQL query to CSV?](https://stackoverflow.com/questions/18129581/how-do-i-output-the-results-of-a-hiveql-query-to-csv) – pault Dec 26 '17 at 20:14
I have seen these before posting, to no avail. – puhtiprince Dec 27 '17 at 16:40
Just quick note here to convert tab into comma: `hive -e 'select * from table limit 10' | sed 's/[\t]/,/g' > outputfile.txt` – The Anh Nguyen Aug 18 '22 at 01:55

Chema · Answer 4 · 2020-07-16T22:20:48.190

0

You can do the following from the command line:

$ beeline --silent=true -u jdbc:hive2://localhost:10000  --outputformat=csv2 --showHeader=false -e 'SELECT * FROM table'

edited Jul 16 '20 at 22:20

answered Apr 29 '20 at 10:28

Chema

2,748
2
13
24

Exporting Hive Query to a CSV

4 Answers4