Example of solving your task using str.rpartition(). I had to reimplement Max() and LJust() functions because you have pyspark
and it has different implementations for built-ins max()
and str.ljust()
.
After running my code you can use res2
or res3
in your code further. res2
contains all rows in format [source, extracted]
and res3
contains just extracted values.
Try it online!
def Max(l):
m = None
for e in l:
if m is None or e > m:
m = e
return m
def LJust(s, n):
return s if len(s) >= n else s + ' ' * (n - len(s))
l = [
'/dbfs/mnt/abc/date=20210225/fsp_store_abcxyz_lmn_',
'/dbfs/mnt/abc/date=20210225/fsp_store_schu_lev_bsd_s_',
]
res = [e.rpartition('/')[-1] for e in l]
res2 = [[e0, e1] for e0, e1 in zip(l, res)]
maxl = Max([len(e) for e in l])
print('Source'.ljust(maxl) + ' Extracted')
print('\n'.join([LJust(s, maxl) + ' ' + d for s, d in res2]))
res3 = [e1 for e0, e1 in res2]
Output:
Source Extracted
/dbfs/mnt/abc/date=20210225/fsp_store_abcxyz_lmn_ fsp_store_abcxyz_lmn_
/dbfs/mnt/abc/date=20210225/fsp_store_schu_lev_bsd_s_ fsp_store_schu_lev_bsd_s_