Python: Sort list used as value in dictionary; data from csv

Question

I have a CSV file, titled jobData with this data:

EMPLOYEE,START_DATE,END_DATE,JOB,DIVISION

Tom     20180101    20191028    Job1    Div_B
Tom     20160101    20171231    Job1    Div_B
Tom     20150609    20151231    Job1    Div_B
Dick    20191001                Job4    Div_D
Dick    20190609    20190930    Job3    Div_C
Dick    20170309    20180608    Job2    Div_A
Dick    20160609    20170308    Job1    Div_B
Harry   20180701                Job2    Div_A
Harry   20180101    20180630    Job2    Div_A
Harry   20160101    20171231    Job1    Div_A

My objective is to structure the data in a dictionary so that each employee is the key, and the value is a list of jobs in chronological order

For example: d = { Tom : [Job1], Dick : [Job1, Job2, Job3, Job4], Harry : [Job1, Job2]}

Currently I have this script:

import csv
jobDataFile = open('jobData.csv')
jobDataReader = csv.reader(jobDataFile)
jobData = list(jobDataReader)

dict = {}

for row in jobData:
    if row[0] not in dict.keys():
        dict[row[0]] = []
    else:
        if row[3] not in dict[row[0]]:
            dict[row[0]].append(row[3])

At this point I get a dictionary, with employee as the key, and a list of job as the value, but the list items are not in chronological order.

How do I use the information in 'startdate' to order the list in each value?

Use `insert` instead of `append` and define the `index` based on the date of the job. — po.pe, Nov 19 '19 at 14:58

streetgang · Accepted Answer · 2019-11-30T00:49:08.273

I would sort the jobData list after reading it from the CSV. You have a list in a list and want to sort this by the index 1, which is START_DATE.

This post will help you accomplish this: How to sort a list of lists by a specific index of the inner list?

Update: After the hint to get more clear, this is how I would prefer to do it. In the link above, this is not the solution selected. After having a closer look to the other answers, I would go with the lambda expression sorting the list in place, because I do not need to import anything as I would have to if using itemgetter. Of course, if you would want to go for performance, you may want to choose the itemgetter. At least the author of the accepted answer says itemgetter is faster. I cannot tell, if it is really faster in terms of performance. However, you could just make it work as you want it for this case by adding 1 line.

jobData.sort(key=lambda x: x[1])

In total this would be:

import csv
jobDataFile = open('jobData.csv')
jobDataReader = csv.reader(jobDataFile)
jobData = list(jobDataReader)

jobData.sort(key=lambda x: x[1])

dict = {}

for row in jobData:
    if row[0] not in dict.keys():
        dict[row[0]] = []
    else:
        if row[3] not in dict[row[0]]:
            dict[row[0]].append(row[3])

Please provide more information in your answer, it's not quite complete. — sebasaenz, Nov 19 '19 at 15:22

po.pe · Answer 2 · 2019-11-19T15:48:54.637

Here's an option using the sort function for lists

jobData  = ["Tom 20180101 20191028 Job1 Div_B",
          "Tom 20160101 20171231 Job1 Div_B",
          "Tom 20150609 20151231 Job1 Div_B",
          "Dick 20191001 Job4 Div_D",
          "Dick 20190609 20190930 Job3 Div_C",
          "Dick 20170309 20180608 Job2 Div_A",
          "Dick 20160609 20170308 Job1 Div_B",
          "Harry 20180701 Job2 Div_A",
          "Harry 20180101 20180630 Job2 Div_A",
          "Harry 20160101 20171231 Job1 Div_A"]

def sort_date(string):
    return string.split()[1]

jobData .sort(key=sort_date)

dict = {}
for i in jobData :
    name = i.split()[0]
    if name not in dict:
        dict[name] = []
        if len(i.split()) == 5: # As not all entries have a start and end date
            dict[name].append(i.split()[3]) # still want to add the first job
        elif len(i.split()) == 4:
            dict[name].append(i.split()[2])
    else:
        if len(i.split()) == 5:
            if i.split()[3] not in dict[name]:
                dict[name].append(i.split()[3])
        elif len(i.split()) == 4:
            if i.split()[2] not in dict[name]:
                dict[name].append(i.split()[2])

print(dict)

Output:

{'Tom': ['Job1'], 'Harry': ['Job1', 'Job2'], 'Dick': ['Job1', 'Job2', 'Job3', 'Job4']}

PS: If you need the dict keys to have the same order as in the input, you can create them before you do the list sort.

Python: Sort list used as value in dictionary; data from csv

2 Answers2