0

I am trying to load data in to a abc.txt file form an .csv file which is stored in delta lake.

Example : Data load with | separation in abc.txt file

id|name|address|contact_no
1|abc|xyz1|123
2|efg|xyz2|456
3|hij|xyz3|789
4|klmn|xyz4|91011

Header data Example:

Table_Name|Employee_details
Execution Date|28.07.2021
Execution Time|13:30:06
Execution Date Range|01.01.2021 To 28.07.2021
Total Number of Records Extracted|1 To 59 of 59
Key Fields: id

how can i append these two in to one .txt file by using azure data bricks pyspark, or python.
Could any one of you assist here.

I need abc.txt file to be loaded as below format

Table_Name|Employee_details
Execution_Date|28.07.2021
Execution_Time|13:30:06
Execution_Date_Range|01.01.2021 To 28.07.2021
Total_Number_of_Records_Extracted|1 To 59 of 59
Key_Fields|id
id|name|address|contact_no
1|abc|xyz1|123
2|efg|xyz2|456
3|hij|xyz3|789
4|klmn|xyz4|91011

I am able to generate two separate files, but not able to do these in one file.

Bernd Wilke πφ
  • 10,390
  • 1
  • 19
  • 38
Sai Kiran
  • 1
  • 1

2 Answers2

0

File1.txt

enter image description here

File2.txt

enter image description here

df1 = spark.read.text("/FileStore/tables/File1.txt")
df2 = spark.read.text("/FileStore/tables/File2.txt")
unioned = df1.union(df2)
unioned.repartition(1).write.text("File3.txt")

enter image description here

Output dataframe

enter image description here

Now you can save Output dataframe to text file using below code.

np.savetxt(r'data\File3.txt', df.values)
Abhishek K
  • 3,047
  • 1
  • 6
  • 19
0

With The Above Same Scenario - After merging I want to Read the final .txt File With Pipe Separator But It is Giving me all Text Into 1st Column Whereas the other columns are - How Should I use the proper pipe separator for the same. I Use Code : df = spark.read.format("csv").options(header=True,sep = "|".schema(schema_fields).load(file_path)

Example :

pqr|abc|jkl|rst|xyz     Null    Null    Null

I wanted Output when we Read as:

abc                     a       b       c     d      e
pqr|abc|jkl|rst|xyz     pqr    abc     jkl    rst    xyz

#apacheSpark

SomeSh
  • 1
  • 3