I need to remove periods from file names in three different columns in a dataframe.
Ex.
http://myserver.org/dir/file-name.with_punct.pdf
Needs to be ( to match what's actually on the server ):
http://myserver.org/dir/file-name_with_punct.pdf
Where I'm at ( solution 1 ):
for index, row in df.iterrows():
file1 = df['file'][index]
file2 = df['image'][index]
file3 = df['Item Type Metadata:Filename'][index]
for f in [file1,file2,file3]:
dirname = os.path.dirname(f)
fname = os.path.basename(f)
name,ext=os.path.splitext(fname)
name = name.replace('.','_',1)
name += ext
f = dirname + name
If I print everything looks right -- but the changes aren't "in place". How can I apply these permanently to the dataframe?
Solution 2 ( with function ):
def hrefReplace(f):
dirname = os.path.dirname(f)
fname = os.path.basename(f)
name,ext=os.path.splitext(fname)
name = name.replace('.','_',1)
name += ext
f = dirname + name
return f
df['file'], df['image'], df['Item Type Metadata:Filename'] = df['file'], df['image'], df['Item Type Metadata:Filename'].apply(hrefReplace)
It works if the file does not have a directory path:
Eg: file_with.period.pdf Becomes: file_with_period.pdf
However, if the file has a path:
http://myserver.org/file/file_with.period.pdf
No change is made.
Working version ( using concept in answer from fsimmonjetz )
def hrefReplace(f):
dirname = os.path.dirname(f) + "/"
fname = os.path.basename(f)
name,ext=os.path.splitext(fname)
name = name.replace('.','_',1)
name += ext
f = dirname + name
return f
df['Item Type Metadata:Filename'] = df['Item Type Metadata:Filename'].apply(hrefReplace)
df['file'] = df['file'].apply(hrefReplace)
df['image'] = df['image'].apply(hrefReplace)
Thanks!