-3

I've read through the threads shown in below:

Renaming filenames using python

Replacing Filename characters with python

But they are not exactly what I am looking for.

What I am trying to accomplish here is to rename files while converting them from Excel into csv. My conversion code works, BUT I also want to get rid of the unnecessary words in my output file names.

Let's say my file names are:

"Sample_file_2016-4-30.xlsx", "Hello_world_2014-5-30.xlsx", "Great_day_2015-1-14.xlsx"

I want my output to be (all characters before the numbers to be deleted):

"2016-4-30.csv", "2014-5-30.csv", "2015-1-14.csv"

Here's what I've already done (and the code works):

def xslx_to_csv():
    files = os.listdir(r"~\files to be converted")
    current_path = os.getcwd()
    os.chdir(r"~\files to be converted")

    for file in files:
        print file
        filename = os.path.splitext(file)[0]

        wb = xlrd.open_workbook(file)
        sh = wb.sheet_by_index(0)
        new_ext = 'csv'
        new_name = (filename, new_ext)
        csvfile = open(".".join(new_name), 'wb')
        wr = csv.writer(csvfile, quoting=csv.QUOTE_ALL)

        for rownum in xrange(sh.nrows):
            wr.writerow(sh.row_values(rownum))

        csvfile.close()

However, this code only gives me the output as following:

"Sample_file_2016-4-30.csv", "Hello_world_2014-5-30.csv", "Great_day_2015-1-14.csv"

What i've tried so far: I've tried using os.rename(), and str.replace() (as suggested by Djizeus), and I've also tried using static string position, e.g.: new_name[14:35] to get the partial name.

But I need a more dynamic method. How to recognize and remove all characters before the numbers in format of yyyy-mm-dd?

Bonus question: I want to take this a bit further, instead of just REMOVING the extra parts from the file names, I wonder how can I ALTER the file names. For example, in this case, the desired output could be:

"Bonus_file_2016-4-30.csv", "Bonus_file_2014-5-30.csv", "Bonus_file_2015-1-14.csv"

So basically, I want to replace the beginning words with a certain word like "Bonus".

Community
  • 1
  • 1
alwaysaskingquestions
  • 1,595
  • 5
  • 22
  • 49
  • `new_name.replace('Sample', 'Bonus')` – Djizeus Apr 06 '16 at 07:02
  • Please do some research before posting... – Djizeus Apr 06 '16 at 07:04
  • hi @Djizeus your answer may fit for this example, but how can i make it more dynamic? what if my files do not start with the same string "Sample"? – alwaysaskingquestions Apr 06 '16 at 07:08
  • Then you should first look at the Python string documentation, see the available methods and what you can do with them. If with that you are not able to do what you like or make them work, then you can ask here and tell us exactly what you are trying to do (format of the filename and expected output), and what you tried so far. See also: [mcve]. – Djizeus Apr 06 '16 at 07:28
  • @Djizeus thank you so much for your comments. I have read through them and added a section of what i have tried so far. does this meet the standard now? Like I said, I'm completely new to Python, not really good at doing researching on it yet. thank you for your patience with me – alwaysaskingquestions Apr 06 '16 at 07:37
  • Is the date always preceded with the `_` character? – Djizeus Apr 06 '16 at 07:42
  • hi @Djizeus not necessarily. I wanted to make it more dynamic. i was thinking more on the line of how to detect the "yyyy-mm-dd" format in the name and only keep that part. – alwaysaskingquestions Apr 06 '16 at 07:59
  • @alwaysaskingquestions Do all the files end in "yyyy-mm-dd.xlsx like" `... 2016-4-30.xlsx`? – jDo Apr 06 '16 at 08:10

1 Answers1

1

When slicing based on fixed indexes or replacing known substrings is not flexible enough for your needs, you have to resort to regular expressions. It is in itself a vast and fairly complex subject, in essence they are mini-programs that you can use to search into strings.

In your particular case, you can use for example this regular expression: \d{4}-\d{1,2}-\d{1,2}$. It means:

  • \d{4}: 4 digits,
  • -: followed by a dash,
  • \d{1,2}: followed by 1 or 2 digits,
  • -: followed by a dash,
  • \d{1,2}: followed by 1 or 2 digits,
  • $: followed by the end of the string.

You would use it like this:

import re

# Compile the regular expression
# r'' is to give a raw string and avoid escaping \ characters
prog = re.compile(r'\d{4}-\d{1,2}-\d{1,2}$')

#Search the regular expression in filename
res = prog.search(filename)

#This gives you the start position of the date
#(assuming all filenames end with a date)
date_start = res.start()
new_name = 'Bonus_file_%s.csv' % filename[date_start:]
Djizeus
  • 4,161
  • 1
  • 24
  • 42
  • ah thank you so much!!! this is exactly what i need! so a side question, you dont have to answer me if you think its a really stupid question, but why use compile() first? – alwaysaskingquestions Apr 08 '16 at 03:55
  • It's not a stupid question :) Actually you don't need, you could also use the shortcut `res = re.search(r'\d{4}-\d{1,2}-\d{1,2}$', filename)`. I did it out of habit, but compiling is useful only when you are going to use the same regular expression multiple times. By compiling it, your program will parse the expression only once, and thus save time. C.f. the Python documentation for more details. – Djizeus Apr 08 '16 at 07:02
  • Okay I understand it now! It's like saving it as a variable/object first, then you can use it over and over again without typing out all the details. Thank you so much! – alwaysaskingquestions Apr 08 '16 at 20:51