I want to read a tsv file and transform into specific pattern and convert the result into tsv. I tried doing the code in python using pandas. But I can not run it as it takes lot of memory.
I want to do the same thing in spark scala. But there is no melt function in scala.
My python code :
import pandas as pd
import string
dir = "related_path"
file = 'file_name.tsv'
file_in = dir + file
file_out= dir+'result.tsv'
df = pd.read_csv(file_in,sep='\t')
df1 = **df.melt(id_vars='Unnamed: 0')**
df1.columns = ['col1', 'col2', 'col3']
df1.index.name = 'index'
print(df1)
df1.to_csv(file_out, index=None, sep='\t', mode='a')
TSV does not contain header
Dataframe of tsv file (df) :
Unnamed: 0 A-4 A-5 Unnamed: 3 A-12
index
0 AB NaN 0.019 NaN 0.10
1 AC 0.017 0.140 0.144 0.18
2 NaN 0.050 0.400 NaN 0.17
3 AE 0.890 0.240 0.450 0.13
Unnamed: 0 A-4 A-5 Unnamed: 3 A-12 (no header) is also a row
output dataframe(df1) :
col1 col2 col3
index
0 AB A-4 NaN
1 AC A-4 0.017
2 NaN A-4 0.050
3 AE A-4 0.890
4 AB A-5 0.019
5 AC A-5 0.140
6 NaN A-5 0.400
7 AE A-5 0.240
8 AB Unnamed: 3 NaN
9 AC Unnamed: 3 0.144
10 NaN Unnamed: 3 NaN
11 AE Unnamed: 3 0.450
12 AB A-12 0.100
13 AC A-12 0.180
14 NaN A-12 0.170
15 AE A-12 0.130
df.melt(id_vars='Unnamed: 0')
is the code for conversion into output dataframe
How to do it in scala as there is no built in melt function
complexity should not n^2