I have a dataframe with two columns: filename
and year
. I want to replace the year value in filename
with value from year
column
Third column in the below table demonstrates the requirement:
+----------------------------+------+----------------------------+
| filename | year | reqd_filename |
+----------------------------+------+----------------------------+
| blah_2020_v1_blah_blah.csv | 1975 | blah_1975_v1_blah_blah.csv |
+----------------------------+------+----------------------------+
| blah_2019_v1_blah_blah.csv | 1984 | blah_1984_v1_blah_blah.csv |
+----------------------------+------+----------------------------+
Code currently looks like below:
df = df.withColumn('filename', F.regexp_replace(F.col('filename',), '(blah_)(.*)(_v1.*)', <Nothing I put here works>))
In short, I want to replace the second group with year
column from df