1

I have large number of files that are named according to a gradually more specific criteria.

Each part of the filename separate by the '_' relate to a drilled down categorization of that file.

The naming convetion looks like this:

TEAM_STRATEGY_ATTRIBUTION_TIMEFRAME_DATE_FILEVIEW

What I am trying to do is iterate through all these files and then pull out a list of how many different occurrences of each naming convention exists.

So essentially this is what I've done so far, I iterated through all the files and made a list of each name. I then separated each name by the '_' and then appended each of those to their respective category lists.

Now I'm trying to export them to a CSV file separated by columns and this is where I'm running into problems

L = [teams, strategies, attributions, time_frames, dates, file_types]

columns = zip(*L)
list(columns)

with open (_outputfolder_, 'w') as f:
    writer = csv.writer(f)
    for column in columns:
        print(column)

This is a rough estimation of the list I'm getting out:

[{'TEAM1'}, 
{'STRATEGY1', 'STRATEGY2', 'STRATEGY3', 'STRATEGY4', 'STRATEGY5', 'STRATEGY6', 'STRATEGY7', 'STRATEGY8', 'STRATEGY9', 'STRATEGY10','STRATEGY11', 'STRATEGY12', 'STRATEGY13', 'STRATEGY14', 'STRATEGY15'}, 
{'ATTRIBUTION1','ATTRIBUTION1','Attribution3','Attribution4','Attribution5', 'Attribution6', 'Attribution7', 'Attribution8', 'Attribution9', 'Attribution10'}, 
{'TIME_FRAME1', 'TIME_FRAME2', 'TIME_FRAME3', 'TIME_FRAME4', 'TIME_FRAME5', 'TIME_FRAME6', 'TIME_FRAME7'}, 
{'DATE1'}, 
{'FILE_TYPE1', 'FILE_TYPE2'}]

What I want the final result to look like is something like:

Team1    STRATEGY1    ATTRIBUTION1    TIME_FRAME1    DATE1    FILE_TYPE1
         STRATEGY2    ATTRIBUTION2    TIME_FRAME2             FILE_TYPE2
         ...          ...             ...                  
         etc.         etc.            etc.

But only the first line actually gets stored in the CSV file.

can anyone help me understand how to iterate just past the first line? I'm sure this is happening because the Team type has only one option, but I don't want this to hinder it.

Hofbr
  • 868
  • 9
  • 31

1 Answers1

2

I referred to the answer, you have to transpose the result and use it. refer the post below ,
Python - Transposing a list (rows with different length) using numpy fails.

I have used natural sorting to sort the integers and appended the lists with blanks to have the expected outcome. The natural sorting is slower for larger lists you can also use third party libraries,

Does Python have a built in function for string natural sort?

def natural_sort(l):
    convert = lambda text: int(text) if text.isdigit() else text.lower()
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ]
    return sorted(l, key = alphanum_key)

res = [[] for _ in range(max(len(sl) for sl in columns))]
count = 0
for sl in columns:
    sorted_sl = natural_sort(sl)
    for x, res_sl in zip(sorted_sl, res):
        res_sl.append(x)


for result in res:
    if (count > 0 ):
        result.insert(0,'')
    count = count +1

with open ("test.csv", 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(res)
f.close()

the columns should be converted in to list before printing to csv file writerows method can be leveraged to print multiplerows https://docs.python.org/2/library/csv.html -- you can find more information here

TEAM1,STRATEGY1,ATTRIBUTION1,TIME_FRAME1,DATE1,FILE_TYPE1
,STRATEGY2,Attribution3,TIME_FRAME2,FILE_TYPE2
,STRATEGY3,Attribution4,TIME_FRAME3
,STRATEGY4,Attribution5,TIME_FRAME4
,STRATEGY5,Attribution6,TIME_FRAME5
,STRATEGY6,Attribution7,TIME_FRAME6
,STRATEGY7,Attribution8,TIME_FRAME7
,STRATEGY8,Attribution9
,STRATEGY9,Attribution10
,STRATEGY10
,STRATEGY11
,STRATEGY12
,STRATEGY13
,STRATEGY14
,STRATEGY15
user_D_A__
  • 460
  • 2
  • 6
  • 14