0

I’ve got an .xlsx file with a certain structure (check the image please) and I would like it to process the containing information to a certain folder structure via mkdir, the only thing I have is an .xlsx and the name of the folder is a String from the first 3 contens of the .xlsx (maybe I must work with VBA?):

At the end of the process, there should be 3 new folders with a picture and .txt file inside

Here it is what it sould look like in the end --> workflow and folder structure

The URL of an .png should be dropped into that created folder, as well as one single .txt file containing the two ProdDec (Production Description) and Collection content.

I have python 3.6.x installed and PowerShell is also on my win7 64x machine.

Many thanks

Rainer Zufall

Rainer Zufall
  • 55
  • 1
  • 7
  • 1
    Have a look at [this post](https://stackoverflow.com/questions/22169325/read-excel-file-in-python) and [openpyxl](https://openpyxl.readthedocs.io/en/stable/) and then give an example of some code you would write so we can help you better – cwahls May 19 '18 at 02:03
  • Thank you for the links, I muts say, I have been there before, but it really does not help, its not generating any new folders (mkdir). Well actually I have built an .xlsx File and I would like to generate via mkdir folders. The name of the new generated folders should be the first 3 lines on the excel (as you can see on the picture), also I would like to save the .png directly in this folder, not an url. Also the .txt should be deployed into the new folder. Right now i am trying to get a code together, but I really don't understand what i am doing – Rainer Zufall May 19 '18 at 13:06

1 Answers1

0

You can read in your Excel file using openpyxl, gather the appropriate information, and then create a directory for each product. The images can be downloaded using the requests library. Below is a tested solution to your problem:

import os

import openpyxl
import requests

wb = openpyxl.load_workbook('A1.xlsx')

sheet = wb.active

rows = [tuple(cell.value for cell in row if cell.value is not None) for row in sheet] # convert the cells to text

dirnames = list()
images = list()
text = list()

for row in rows[1:]: # [1:], ignore column headers for better looping
    if row[0] is not None:
        dirnames.append('_'.join(row[:3])) # joins the Brand, Family, and Ref columns
        images.append(row[3:-2]) # stores the uris in a tuple
        text.append('\r\n'.join(row[-2:])) # joins the last two columns

for i in range(len(dirnames)):
    if not(os.path.exists(dirnames[i])):
        os.mkdir(dirnames[i]) # create the dir
    os.chdir(dirnames[i]) # enter the dir
    print('creating', dirnames[i])
    for j, image in enumerate(images[i]): # for all of the images
        imagereq = requests.get(image)
        imagename = 'Img{}.png'.format(j + 1)
        if imagereq.status_code == 200: # prevents filewriting errors for bad requests
            with open(imagename, 'wb') as fp:
                fp.write(imagereq.content)
            print(' ' * 4 + 'image write successful for', imagename)
        else:
            print(' ' * 4 + 'could not download image {}, error'.format(imagename), imagereq.status_code, imagereq.reason)
    with open('ProdDesc_and_Collection.txt', 'wb') as fp:
        fp.write(text[i].encode('utf8'))

    os.chdir('..') # back out of the dir

UPDATE: Code now works for multiple images per product and ignores empty cells

cwahls
  • 743
  • 7
  • 22
  • Hi Clay, wow, thank you, I could never have thought of that! I have two more feedbacks: 1) Line 17 (dirnames.append) gives me an Error: TypeError: sequence item 0: expected str instance, NoneType found. 2) What if I have 2 or more images to store? do I just cope the section of "if imagereq" to "print('could not download image, error'..." – Rainer Zufall May 20 '18 at 11:16
  • 1) Which one of your cells blank? 2) Do you have additional columns or do the links have a delimiter within the cell? – cwahls May 20 '18 at 17:13
  • I just uploaded the .xlsx on http://www.dongiatti.com/wp-content/uploads/2018/05/For_Clay.xlsx actually no no cell is blank, or am I missing something? No delimiter – Rainer Zufall May 20 '18 at 17:45
  • Thank you so much for your work! Let me know If I can do you a favor in any kind ,except coding :) – Rainer Zufall May 20 '18 at 20:56
  • Hi Clay, sorry to reach out for you again, my final file does not go through: text.append('\n'.join(row[-2:])) # joins the last two columns TypeError: sequence item 0: expected str instance, NoneType found I tried several things, recreat a new file, searching vo delimiters etc. http://www.dongiatti.com/wp-content/uploads/2018/05/A1.xlsx – Rainer Zufall May 21 '18 at 11:29
  • @RainerZufall Done – cwahls May 21 '18 at 17:48
  • Hi Clay, sorry for the interruption, to hope this is the last time. I am trying to loop the image names and text names, as “Brand_Family_Ref_1” i.e. “Cartier_Baignoire_WB520003_1.png”. If there are multiple pictures it goes_2, _3 etc. This also for the .txt: “Brand_Family_Ref_info_en” i.e. “Cartier_Baignoire_WB520003_info_en.txt”. I am really sorry to bother you again…I tried imagename = dirnames{}.png.format(j + 1) and I tried many more, but I am just messing up the code. – Rainer Zufall May 21 '18 at 23:29
  • Try `dirname` as the first formatting argument or just concatenate. Ex. `'{}_{}.png'.format(dirname, j + 1)` or `dirname + '_{}.png'.format(j + 1)` – cwahls May 22 '18 at 05:26
  • Thank you Clay! imagename = dirnames[i] + '_{}.png'.format(j + 1) did the trick...also i tried the sam ething for the text: for texts in enumerate(text): # for all of the images textname = dirnames[i] + '_info_en.txt' with open(textname, 'wb') as fp: fp.write(text[i].encode('utf8')) I really learned the trick! Thank you Clay, if you wish a shipment from www.dongiatti.com, please give me a pm! Thank you! – Rainer Zufall May 22 '18 at 09:38
  • I feel like a complete fool. If I delete the .xlsx and and have all the folder and files in there, how do I get the .xlsx back in the same structure? Maybe it’s an overkill….I tried something with “import xlsxwriter” and “workbook = xlsxwriter.Workbook('Cartier_reverse.xlsx') – Rainer Zufall May 22 '18 at 16:06
  • Did you delete the file on accident? It should be in your recycle bin – cwahls May 22 '18 at 18:33
  • I wish :) no I have the file from Cartier still. The problem is, for 98% I get the .xlsx out and now a friend provied me direct the folder structure...and I need always both types (.xlsx & Folder) and now i am working on an reverse engineering to get the .xlsx out. from the folder structure -.- – Rainer Zufall May 22 '18 at 19:09
  • Yeah, that's the right approach. List the folders and create the list of tuples I created in the forward version. The write the contents after the headers. Are the number of images the same for each of the products? – cwahls May 22 '18 at 19:19
  • ok I will try, yes all the files are the same, same structure. I will try it with the Cartier, so i don't mess up the code with new information – Rainer Zufall May 22 '18 at 19:40
  • May I show you the current code somewhere? If its too much work to correct it, i might just go to freelance.com oder upwork, or also send you the funds – Rainer Zufall May 23 '18 at 11:00