We receive fixed width File which has multi header/multi section i,e. data about subgroups of company.
First record would be Organization followed by N different sections of subgroups of company operating around the world.
Below is the data
5512345worldwidenetwork123449
6634455australiannetwok123455
8823455 austestcrickifoxvf@abc.aus.co12344444
8878557 syndenycrickifoxvf@abc.aus.co45345444444
8834888 melbourcecricoxvf@abc.aus.co57645344444
99345676455
66987455uitedkingdomnetwok123455
8893855 uktestcrickixvf@zxc.uk.co87665559
8878557 londoncrickilondon@uk.co45345444444
8834889 machestermanctiry@man.uk.co57645344444
99788834
0000000
Data is organized in below fashion
- Top row contains the organization name (row starting with Value 55 )(only on record per file)
(Below section would repeat n number of times)
- second row contains details of companies operating in one country(row starting with 66)
- next N records contains data of franchises of companies operating in separate cities(row starting with 88)
- record indicating end of sub group data( row starting with 99)
- record indicating end of file( row starting with 00)(only on record per file)
I am mostly interested data that starts with 88 , what would the right way to get the data. I tried with below option no luck
rdd1=sqlContext.read.text("file:///home/mahesh/source/network.txt");
df = rdd1.filter(lambda x: "21" in x).map(lambda x: x).toDF()
How can i filter records starting with only 88 or is there any other option to load this data
Below is the expected output
8823455 austestcrickifoxvf@abc.aus.co12344444
8878557 syndenycrickifoxvf@abc.aus.co45345444444
8834888 melbourcecricoxvf@abc.aus.co57645344444
8893855 uktestcrickixvf@zxc.uk.co87665559
8878557 londoncrickilondon@uk.co45345444444
8834889 machestermanctiry@man.uk.co57645344444
Once i can filter data like all rows starting with 88 then i can apply fixed width parsing to get below data
FrnachiseID FrnachiseName Email ID No of subscribers
8823455 austestcrickif oxvf@abc.aus.co 12344444
8878557 syndenycrickifo xvf@abc.aus.co 45345444444
8834888 melbourcecric oxvf@abc.aus.co5 7645344444
8893855 uktestcricki xvf@zxc.uk.co 87665559
8878557 londoncricki london@uk.co 45345444444
8834889 machester manctiry@man.uk.co 57645344444