I am getting data in file1.dat file with data separated by | character.
109|LK98765|2|18.07.2021|01|abc1|01|abc2|01|abc3
110|LK67665|2|10.10.1987|02|abc1|01|abc2|01|abc3
111|LK43465|2|23.07.2005|03|abc1|01|abc2|01|abc3
112|LK23265|2|13.02.2012|04|abc1|01|abc2|01|abc3
My requirement is to add header to the file and change it to .csv with field separator as ,.
To achieve the above requirement, below code is written in python.
to add header:
def fn_add_header(file_name):
with(open(file_name) as f:
r=csv.reader(f)
data = line [line for line in r]
with(open(file_name,'wb') as f:
w =csv.writer(f)
w.writerow(['ID','SEC_NO','SEC_CD','SEC_DATE','SEC_ID1','SEC_DESC1','SEC_ID2','SEC_DESC2','SEC_ID3','SEC_DESC3'])
w.writerows(data)
To change the file to csv:
def fn_replace(filename,directory)
final_file = directory+"\file1.csv"
for file in os.listdir(filename)
if fnmatch.fnmatch(file.lower(),filename.lower()):
shutil.copyfile (file,final_file )
cmd = ["sed","-i","-e"'s/|/,/g',final_file )
ret2,out2,err2 = fn_run_cmd(cmd)
The above code is working fine and I am getting the converted file as:
ID,SEC_NO,SEC_CD,SEC_DATE,SEC_ID1,SEC_DESC1,SEC_ID2,SEC_DESC2,SEC_ID3,SEC_DESC3
109,LK98765,2,18.07.2021,01,abc1,01,abc2,01,abc3
110,LK67665,2,10.10.1987,02,abc1,01,abc2,01,abc3
111,LK43465,2,23.07.2005,03,abc1,01,abc2,01,abc3
112,LK23265,2,13.02.2012,04,abc1,01,abc2,01,abc3
I am facing issue while reading the above converted file.csv in yml. To read the file i am using below code:
frameworkComponents:
today_file:
inputDirectoryPath: <path of the file>
componentName: today_file
componentType: inputLoader
hadoopfileFormat: csv
csvSep: ','
selectstmt:
componentName: selectstmt
componentType: executeSparlSQL
sql: |-
select ID,SEC_NO,
SEC_CD,SEC_DATE,
SEC_ID1,SEC_DESC1,
SEC_ID2,SEC_DESC2,
SEC_ID3,SEC_DESC3
from today_file
write_file:
componentName: write_file
componentType: outputWriter
hadoopfileFormat: avro
numberofPartition: 1
outputDirectoryPath: <path of the file>
precedence:
selectstmt:
dependsOn:
today_file: today_file
write_file:
dependsOn:
selectstmt: selectstmt
When I am running the yml I am getting below error.
Unable to infer schema for CSV. It must be specified manually.