I have a list of Filename, FilePath and FileSize I want to insert these details into my hive table using spark SQL.
var fs1 = FileSystem.get(sparksession.sparkContext.hadoopConfiguration)
var file_path = fs1.listStatus(new Path("path")).filter(_.isFile).map(_.getPath).toList
var new_files = fs1.listStatus(new Path("path")).filter(_.isFile).map(_.getPath.getName).toList
var file_size = fs1.listStatus(new Path("path")).filter(_.isFile).map(_.getLen).toList
var file_data = file_path zip new_files zip file_size
for ((filedetail, size) <- file_size){
var filepath = filedetail._1
var filesize: Long = size
var filename = filedetail._2
var df = spark.sql(s"insert into mytable(file_path,filename,file_size) select '${file_path}' as file_path,'${new_files}' as filename,'${file_size}' as file_size")
df.write.insertInto("dbname.tablename")
}
It's generating this insert query:
insert into mytable(file_path,filename,file_size) select 'List(path/filename.txt,path/filename4.txt,path/filename5.txt)' as file_path,'List(filename.txt, filename4.txt, filename5.txt)' as filename,'List(19, 19, 19)' as file_size;
and I'm getting an error:
mismatched input 'file_path' expecting {'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 34)
I want to insert data in table format
file_path filename file_size
path/filename.txt filename.txt 19
path/filename4.txt filename4.txt 19
path/filename5.txt filename5.txt 19
Can someone please suggest me how should i insert data like above?
is there any way to again split this query into 3 different insert hive statements.
insert into mytable(file_path,filename,file_size) select 'path/filename.txt' as file_path,'filename.txt' as filename,'19' as file_size;
insert into mytable(file_path,filename,file_size) select 'path/filename3.txt' as file_path,'filename3.txt' as filename,'19' as file_size;
insert into mytable(file_path,filename,file_size) select 'path/filename4.txt' as file_path,'filename4.txt' as filename,'19' as file_size;