10

I know there is a known issue with delimiters when saving a table to csv (or other text file) in Hive. So I'm wondering if you guys can help me get around that.

I have an existing table (Table A) and I would like to save it in csv format to hdfs. From reading other responses I believe I would have to first create an external table (but I'm not sure how the whole thing would look).

Can anyone help?

Laura
  • 320
  • 1
  • 4
  • 12
  • Personally, I'd just use `insert overwrite directory` and have a `\001` delimited file. But if you *must* have a csv, I think you're best off piping the query to a local .csv and then `hadoop fs -put`-ing it into hdfs, which is described here http://stackoverflow.com/questions/18129581/how-do-i-output-the-results-of-a-hiveql-query-to-csv – o-90 May 14 '15 at 03:14
  • I want to avoid saving such a large file to the local directory if possible. My issue is that I then need to load the file into R and to do this you normally specify a delimiter, not sure how this would work (or if it would work) with an invisible character. – Laura May 14 '15 at 14:10
  • I load data from hdfs to spark and specify the delimiter as `"\001"`. I've never had (a reason) to import a file from HDFS to R but I'd assume it can take any delimiter as long as it is specified correctly. – o-90 May 15 '15 at 03:02

4 Answers4

12

Try this in hive shell:

INSERT OVERWRITE LOCAL DIRECTORY '/path/to/hive/csv' ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' SELECT * FROM hivetablename;

Change your /path/to/csv to the location where you want to store csv file. hivetablename to your hive table to be stored in csv format.

Rajesh N
  • 2,554
  • 1
  • 13
  • 17
  • 3
    I think that she wants it saved on hdfs, not on the local fs. – Kim Moritz Dec 01 '17 at 07:08
  • 2
    @KimMoritz In that case, excluding `LOCAL` and specifying the HDFS path should do the trick. – Andrea Jul 13 '18 at 09:19
  • i got this error `Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Failed to create local target directory /export` – didi Apr 14 '21 at 07:31
4

This three step process worked fine for me:

  1. in HIVE, create a new temporary table stored as textfile

    CREATE TABLE temp_table( id INT, name STRING)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
    STORED AS TEXTFILE;
    
  2. also in HIVE, write the source data into the temporary table

    INSERT OVERWRITE TABLE temp_table
    SELECT id, name FROM source_table;
    
  3. From the command line, copy the Hive table from its location in HDFS to you local file - inspection should reveal the file as csv (rename accordingly if desired)

    hdfs dfs -copyToLocal /apps/hive/warehouse/temp_table/* /tmp/local_dir/
    

If you run the command SHOW CREATE TABLE temp_table in HIVE, it will tell you the exact location of the table in HDFS, e.g.

| LOCATION                                                        |
|   'hdfs://hadoop_cluster/apps/hive/warehouse/temp_table'        |
Hendrik F
  • 3,690
  • 3
  • 21
  • 24
1

For external table in hive, you can follow the below steps:

  1. Create external table in hive

    CREATE EXTERNAL TABLE external_table( number INT, name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/user/hive/external/mytable/';

2. Load the data file from local to HDFS location

hadoop fs -put /home/user1/Desktop/filename.csv /user/hive/external/mytable/

The above two steps can solve your problem.

Farooque
  • 3,616
  • 2
  • 29
  • 41
  • where am I inputting the existing table that I have created? I see this as creating a blank table and putting it in the local directory. Can you explain more? – Laura May 14 '15 at 18:26
  • 2
    LOCATION '/user/hive/external/mytable/'; the location value is the HDFS location directory and NOT local directory. while moving the file i.e. hadoop fs -put /home/user1/Desktop/filename.csv /user/hive/external/mytable/ , it is from local directory to HDFS location directory – Farooque May 14 '15 at 23:28
  • my table is already stored in hdfs and can be accessed with a /user/hive... path – Laura May 18 '15 at 16:20
  • If you can access the csv file data using select query and open the csv file using hadoop fs -cat 'file-path' correctly, then now move the csv file to the required location where you want. – Farooque May 18 '15 at 16:33
0

3 steps you may follow:-

  1. Create a temporary table

    DROP TABLE IF EXISTS Test; CREATE TABLE Test ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n' AS SELECT * FROM ABC #OR whole query statements

  2. Make a CSV from created hive table

    hive -e 'set hive.cli.print.header=true; select * from Test' | sed 's/[\t]/,/g' > /home/hadoop/Mycsv.csv

  3. Put the created CSV file to S3 of whatever location you want

    hdfs dfs -put -f /home/hadoop/Mycsv.csv 's3://bucketName/'

Hari_pb
  • 7,088
  • 3
  • 45
  • 53