0

I have been searching for my answer, probably just not using the right verbiage, and only come up with using lists as dictionary key values.

I need to take 20 csv files and anonomyze identifying student, teacher, school and district information for research purposes on testing data. The csv files range anywhere from 20K to 50K rows and 11 to 20 columns, not all have identical information.

One file may have:

studid, termdates, testname, score, standarderr

And another may have:

termdates, studid, studfirstname, studlastname, studdob, ethnicity, grade

And yet another may have:

termdates, studid, teacher, classname, schoolname, districtname

I am putting the varying data into dictionaries for each type of file/dataset, maybe this isn't the best, but I am getting stuck when trying to use a dictionary as a key value for when a student may have taken multiple tests i.e. Language, Reading, Math etc.

For instance:

studDict{studid{'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'},
        studid1{'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'}}

Any guidance on which libraries or a brief direction to a method would be greatly appreciated. I understand enough Python that I do not need a full hand holding, but helping me get across the street would be great. :D

CLARIFICATION

I have a better chance of winning the lottery than this project does of being used more than once, so the simpler the method the better. If it would be a repeating project I would most likely dump the data into db tables and work from there.

b4hand
  • 9,550
  • 4
  • 44
  • 49
bmeredith
  • 59
  • 1
  • 7
  • try sqldict python module, it may help you – sudhishkr May 20 '15 at 01:30
  • 2
    "this project will never be used more than once so I'll just hack it together" seems like the key phrase to use to ensure you're building a new core product for a company – Eric Renouf May 20 '15 at 01:46
  • It was dumped on my lap via another department who got it via another department and they need it 'yesterday' lol. – bmeredith May 20 '15 at 01:50
  • There may be two ways to approach this. One is pandas clearly as someone has stated- where you can read - each CSV into a separate dataframe and you can make them into a 'pandas panel'. May be something you want. But hard to tell precisely - how - unless there's 'some data' one can have a feel of and what is it that you want 'eventually'. So not very sure how useful this approach is. – gabhijit May 20 '15 at 02:34

5 Answers5

1

You cannot use a dictionary as a key to a dictionary. Keys must be hashable (i.e., immutable), and dictionaries are not, therefore cannot be used as keys.

You can store a dictionary in another dictionary just the same as any other value. You can, for example do

studDict = { studid: {'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'},
    studid1: {'newid': 12345, 'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'}}

assuming you have defined studid and studid1 elsewhere.

Eric Renouf
  • 13,950
  • 3
  • 45
  • 67
  • Right, but I'm asking about using a dictionary as a value to a key, not the key itself, maybe I should re-word that. – bmeredith May 20 '15 at 01:31
  • 1
    @bmeredith: No idea what that means. Can you give us an example of values you are having problem with, and how you tried to use them? – Amadan May 20 '15 at 01:32
  • 1
    @bmeredith What is a "key value"? There are "keys" and there are "values" but I'm not sure what a "key value" is. If you want to use them as values, there shouldn't be any difficulty in doing `dict1[ key ] = dict2` and then `dict2` will be the value in `dict1` mapped to `key` – Eric Renouf May 20 '15 at 01:32
  • Just updated with a quick example, sorry about that. Looking into using a dictionary as the value to a key – bmeredith May 20 '15 at 01:36
  • @bmeredith ok, updated for what I hope you were asking about, note that I have a `:` after the `studid` to make it a key and the thing after the `:` is the value, which in this case happens to be a dictionary too – Eric Renouf May 20 '15 at 01:43
  • That's the route I tried to take (wish I had my work laptop with me to show you some code), but essentially when I went to update studDict['studid1']['test1'] = {'score': 50, 'date': 1/1/1, etc etc} I get a KeyError on studid1. Any idea why that might be? – bmeredith May 20 '15 at 01:46
  • 1
    @bmeredith well, make sure that all strings and variables are lining up. In the example code you're using `studid1` as a variable name, but here it's a string, so that would be the first thing to check. If you're getting the string key names from a file, make sure you're `strip`ing white space and case matches and all that stuff – Eric Renouf May 20 '15 at 01:59
  • studid1, studid2 etc are pulled from the first set of files that I parse and used as the Dict names to house each students specific data. The files are a dump created by our system, so I didn't even think to strip the data before processing as it's created by a machine. I will give that a go in the morning. – bmeredith May 20 '15 at 02:08
1

A dictionary cannot be a key, but a dictionary can be a value for some key in another dictionary (a dict-of-dicts). However, instantiating dictionaries of varying length for every tuple is probably going to make your data analysis very difficult.

Consider using Pandas to read the tuples into a DataFrame with null values where appropriate.

dict API: https://docs.python.org/2/library/stdtypes.html#mapping-types-dict

Pandas Data handling package: http://pandas.pydata.org/

manglano
  • 844
  • 1
  • 7
  • 21
0

If I interpret you correctly, in the end you want a dict with students (i.e. studid) as key and different student related data as value? This is probably not exactly what you want, but I think it will point you in the right direction (adapted from this answer):

import csv
from collections import namedtuple, defaultdict

D = defaultdict(list)
for filename in files:
    with open(filename, mode="r") as infile:
        reader = csv.reader(infile)
        Data = namedtuple("Data", next(reader))
        for row in reader:
            data = Data(*row)
            D[data.studid].append(data)

In the end that should give you a dict D with stuids as keys and a list of test results as values. Each test result is a namedtuple. This assumes that every file has a studid column!.

Community
  • 1
  • 1
jorgeh
  • 1,727
  • 20
  • 32
0

If you can know the order of a file ahead of time, it's not hard to make a dictionary for it with help from csv.

File tests.csv:

12345,2015-05-19,AP_Bio,96,0.12
67890,2015-04-28,AP_Calc,92,0.17

In a Python file in the same directory as tests.csv:

import csv

with open("tests.csv") as tests:
    # Change the fields for files that follow a different form
    fields = ["studid", "termdates", "testname", "score", "standarderr"]
    students_data = list(csv.DictReader(tests, fieldnames=fields))

# Just a pretty show
print(*students_data, sep="\n")
# {'studid': '12345', 'testname': 'AP_Bio', 'standarderr': '0.12', 'termdates': '2015-05-19', 'score': '96'}
# {'studid': '67890', 'testname': 'AP_Calc', 'standarderr': '0.17', 'termdates': '2015-04-28', 'score': '92'}
Navith
  • 929
  • 1
  • 9
  • 15
  • This is the method that is in place right now, but the layout will essentially be a dictionary of dictionaries which may have dictionaries as the values for certain keys which is throwing me a keyerror – bmeredith May 20 '15 at 02:11
  • Can you explain? Is your goal to have a dictionary with keys of student IDs and values of lists of data on their tests? – Navith May 20 '15 at 02:20
  • So essentially studDict{studID1{key:val, key:val, key:{key:val, key:val, key:val}, key:val, key:val}, studID2{key:val, key:val, key:{key:val, key:val, key:val}, key:val, key:val}} – bmeredith May 20 '15 at 02:24
  • I don't think dictionaries are the right data structure if you want to mix dictionaries and regular values. Make a `Student` class and use attributes instead. – Navith May 20 '15 at 02:43
0

Be more explicit please. Your solution depend on the design.

in district you have schools and in each school you have teachers or student.

first you order your datas by district and school

    districts = { 
                 "name_district1":{...}, 
                 "name_district2":{...},
                 ...,
                 "name_districtn":{...},
                }

for each distric:

    # "name_districtn"
      {
        "name_school1": {...},
        "name_school2": {...},
        ...,
        "name_schooln": {...},
      }

for each school: #"name_schooln"

{
  id_student1: {...},
  id_student2: {...},
  ...,
  id_studentn: {...}  
}

and for each student...you define his elements

you can also define one dictionary for all the student but you have to design a uniq id for each student in this case for example:

   uniq_Id = "".join(("name_district","name_school", str(student_id)))
   Total = {
             uniq_Id: {'dob': 1/1/1, test1:{'score': 50, 'date': 1/1/15}, test2:{'score': 50, 'date': 1/1/15}, 'school': 'Hard Knocks'}} ,
           ...,
           }