I have 620 csv files and they have different columns and data. For example:
//file1.csv
word, count1
w1, 100
w2, 200
//file2.csv
word, count2
w1, 12
w5, 22
//Similarly fileN.csv
word, countN
w7, 17
w2, 28
My expected output
//result.csv
word, count1, count2, countN
w1, 100, 12, null
w2, 200 , null, 28
w5, null, 22, null
w7, null, null, 17
I was able to do it in Scala for two files like this where df1
is file1.csv
and df2
is file2.csv
:
df1.join(df2, Seq("word"),"fullouter").show()
I need any solution, either in Scala or Linux command to do this.