1

I need to retrieve the filename.txt from a linux path and insert filename into a table column in hive. Is it possible to retrieve only file name from a path and insert it into a hive table using virtual columns? please advice!

e.g. of path /home/usr/path/filename.txt and insert filename into a table. create table t( name string);

Thanks!

Holmes
  • 1,059
  • 2
  • 17
  • 25

2 Answers2

2

if you want to run against HDFS - command - awk -F "/" '{print $NF}' will you just file name.

[cloudera@quickstart ~]$ hadoop fs -ls /user/cloudera/departments|awk -F "/" '{print $NF}'|egrep -v 'Found|_SUCCESS'
part-m-00000
part-m-00001
[cloudera@quickstart ~]$ 

if you want to run against local file system then - command - ls -1 will give you file name. you can also use awk -F "/" '{print $NF}'

you can create shell script as: (uncomment hive statements)

#!/bin/sh

files=`hadoop fs -ls /user/cloudera/departments|awk -F "/" '{print $NF}'|egrep -v 'Found|_SUCCESS'`

for file in $files
 do
   #hive -e "insert into table t(name) values (\"$file\");"
   echo "insert into table t(name) values (\"$file\");"
done

should instert in hive table:

[cloudera@quickstart ~]$ ./test.sh 
insert into table t(name) values ("part-m-00000");
insert into table t(name) values ("part-m-00001");
Ronak Patel
  • 3,819
  • 1
  • 16
  • 29
1

For example you have a full file name in variable $filename:

 #!/bin/bash
        filename=$(basename "$fullfile")

        #pass variable to the hive script:
        hive -hiveconf filename=$filename -f your_script_name.hql

Inside the script:

        insert into your_table
        select some columns,  '${hiveconf:filename}' as filename  --use variable
        from some table... ;
        or just insert values... 

Or just simply do:

hive -e "insert into t values '$filename'"
leftjoin
  • 36,950
  • 8
  • 57
  • 116