1

I need to remove periods from file names in three different columns in a dataframe.

Ex. http://myserver.org/dir/file-name.with_punct.pdf

Needs to be ( to match what's actually on the server ): http://myserver.org/dir/file-name_with_punct.pdf

Where I'm at ( solution 1 ):

for index, row in df.iterrows():
file1 = df['file'][index]
file2 = df['image'][index]
file3 = df['Item Type Metadata:Filename'][index]
for f in [file1,file2,file3]:
    dirname = os.path.dirname(f)
    fname = os.path.basename(f)
    name,ext=os.path.splitext(fname)
    name = name.replace('.','_',1)
    name += ext
    f = dirname + name

If I print everything looks right -- but the changes aren't "in place". How can I apply these permanently to the dataframe?

Solution 2 ( with function ):

def hrefReplace(f):
    dirname = os.path.dirname(f)
    fname = os.path.basename(f)
    name,ext=os.path.splitext(fname)
    name = name.replace('.','_',1)
    name += ext
    f = dirname + name
    return f

df['file'], df['image'], df['Item Type Metadata:Filename'] = df['file'], df['image'], df['Item Type Metadata:Filename'].apply(hrefReplace)

It works if the file does not have a directory path:

Eg: file_with.period.pdf Becomes: file_with_period.pdf

However, if the file has a path: http://myserver.org/file/file_with.period.pdf

No change is made.

Working version ( using concept in answer from fsimmonjetz )

def hrefReplace(f):
    dirname = os.path.dirname(f) + "/"
    fname = os.path.basename(f)
    name,ext=os.path.splitext(fname)
    name = name.replace('.','_',1)
    name += ext
    f = dirname + name
    return f

df['Item Type Metadata:Filename'] = df['Item Type Metadata:Filename'].apply(hrefReplace)
df['file'] = df['file'].apply(hrefReplace)
df['image'] = df['image'].apply(hrefReplace)

Thanks!

EA Bubnoff
  • 195
  • 2
  • 11
  • 1
    Given link is not accessible – Abhi Jul 30 '21 at 16:37
  • The link isn't live or real -- I just gave that as a template showing the issue. I need to change the period to a hyphen in the path names in three columns. – EA Bubnoff Jul 30 '21 at 16:45
  • oh God you wrote it as if it was a example link to some doc you sharing. Anyways yo can add the result to new column or an existing column like ```df[column_name] = f``` – Abhi Jul 30 '21 at 16:48
  • Sorry about the links. Fixed. – EA Bubnoff Jul 30 '21 at 17:05
  • Yeah! :) It should also work if you do `df[['file', 'image', 'Item Type Metadata:Filename']] = df[['file', 'image', 'Item Type Metadata:Filename']].apply(hrefReplace)`, but there's nothing wrong with what you have. Glad to help! – fsimonjetz Jul 30 '21 at 17:37

1 Answers1

0

If you have a dataframe like this

>>> df
      file        image             type
0  A.B.txt  foo.bar.jpg  hello.world.pdf
1  B.C.txt  bar.baz.jpg      how.are.pdf
2  C.D.txt  baz.foo.jpg    you.today.pdf

you can apply custom functions to the whole thing at once, e.g.,

>>> df = df.apply(lambda x:x.str.replace('.', '_', 1, regex=False))
>>> df
      file        image             type
0  A_B.txt  foo_bar.jpg  hello_world.pdf
1  B_C.txt  bar_baz.jpg      how_are.pdf
2  C_D.txt  baz_foo.jpg    you_today.pdf

iterrows() is rarely, almost never, the preferred way to work with dataframes (see here).

fsimonjetz
  • 5,644
  • 3
  • 5
  • 21
  • 1
    Thank you. I'm beginning to see your point about iterrows. I wrote a function to do what I need. Guessing I can use apply with that. Will test. – EA Bubnoff Jul 30 '21 at 17:10