I was able to get your proposed code to work but it requires some tweaks to the code and data:
- There must be some additional step where you are deserializing the
students
column of rooms.csv
to a collection of objects. It appears to be a ScriptBlock
that evaluates to an array of HashTable
s, but some changes to the CSV input are still needed:
- The
StartDate
and EndDate
properties need to be quoted and cast to [DateTime]
.
- At least for rooms that contain multiple students, the value must be quoted so
Import-Csv
doesn't interpret the ,
separating array elements as an additional column.
- The downside of using CSV as an intermediate format is the original property types are lost; everything becomes a
[String]
upon import. Sometimes it's desirable to cast back to the original type for efficiency purposes, and sometimes it's absolutely necessary in order for certain operations to work. You could cast those properties every time you use them, but I prefer to cast them once immediately after import.
With those changes rooms.csv
becomes...
roomName, roomNo, students
room1, 1, "{@{StudentNo=111; StudentName='john smith'; StartDate=[DateTime] '2018-01-01T00:00:00'; EndDate=[DateTime] '2018-07-06T00:00:00'}}"
room2, 2, "{@{StudentNo=222; StudentName='jane doe'; StartDate=[DateTime] '2018-01-01T00:00:00'; EndDate=[DateTime] '2018-07-06T00:00:00'}}"
...and the script becomes...
# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
| Select-Object `
-ExcludeProperty 'students' `
-Property '*', @{
Name = 'Students'
Expression = {
$studentsText = $_.students
$studentsScriptBlock = Invoke-Expression -Command $studentsText
$studentsArray = @(& $studentsScriptBlock)
return $studentsArray
}
}
# Replace the [String] property "StudentNo" with an [Int32] property of the same name
$students = Import-Csv students.csv `
| Select-Object `
-ExcludeProperty 'StudentNo' `
-Property '*', @{
Name = 'StudentNo'
Expression = { [Int32] $_.StudentNo }
}
$combined = $students `
| Select-Object -Property `
'StudentNo', `
'PreferredSurname', `
'PreferredFirstnames', `
'UPN', `
@{
Name = "roomName";
Expression = {
foreach ($r in $rooms)
{
if ($r.Students.StudentNo -contains $_.StudentNo)
{
return $r.roomName
}
}
#TODO: Return text indicating room not found?
}
}
The reason this can be slow is because you are performing a linear search - two of them, in fact - for every student object; first through the collection of rooms (foreach
), then through the collection of students in each room (-contains
). This can quickly turn into a lot of iterations and equality comparisons because in every room to which the current student is not assigned you are iterating the entire collection of that room's students, on and on until you do find the room for that student.
One easy optimization you can make when performing a linear search is to sort the items you're searching (in this case, the Students
property will be ordered by the StudentNo
property of each student)...
# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
| Select-Object `
-ExcludeProperty 'students' `
-Property '*', @{
Name = 'Students'
Expression = {
$studentsText = $_.students
$studentsScriptBlock = Invoke-Expression -Command $studentsText
$studentsArray = @(& $studentsScriptBlock) `
| Sort-Object -Property @{ Expression = { $_.StudentNo } }
return $studentsArray
}
}
...and then when you're searching that same collection if you come across an item that is greater than the one you're searching for you know the remainder of the collection can't possibly contain what you're searching for and you can immediately abort the search...
@{
Name = "roomName";
Expression = {
foreach ($r in $rooms)
{
# Requires $room.Students to be sorted by StudentNo
foreach ($roomStudentNo in $r.Students.StudentNo)
{
if ($roomStudentNo -eq $_.StudentNo)
{
# Return the matched room name and stop searching this and further rooms
return $r.roomName
}
elseif ($roomStudentNo -gt $_.StudentNo)
{
# Stop searching this room
break
}
# $roomStudentNo is less than $_.StudentNo; keep searching this room
}
}
#TODO: Return text indicating room not found?
}
}
Better yet, with a sorted collection you can also perform a binary search, which is faster than a linear search*. The Array
class already provides a BinarySearch
static method, so we can accomplish this in less code, too...
@{
Name = "roomName";
Expression = {
foreach ($r in $rooms)
{
# Requires $room.Students to be sorted by StudentNo
if ([Array]::BinarySearch($r.Students.StudentNo, $_.StudentNo) -ge 0)
{
return $r.roomName
}
}
#TODO: Return text indicating room not found?
}
}
The way I would approach this problem, however, is to use a [HashTable]
mapping a StudentNo
to a room. There is a little preprocessing required to build the [HashTable]
but this will provide constant-time lookups when retrieving the room for a student.
function GetRoomsByStudentNoTable()
{
$table = @{ }
foreach ($room in $rooms)
{
foreach ($student in $room.Students)
{
#NOTE: It is assumed each student belongs to at most one room
$table[$student.StudentNo] = $room
}
}
return $table
}
# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
| Select-Object `
-ExcludeProperty 'students' `
-Property '*', @{
Name = 'Students'
Expression = {
$studentsText = $_.students
$studentsScriptBlock = Invoke-Expression -Command $studentsText
$studentsArray = @(& $studentsScriptBlock)
return $studentsArray
}
}
# Replace the [String] property "StudentNo" with an [Int32] property of the same name
$students = Import-Csv students.csv `
| Select-Object `
-ExcludeProperty 'StudentNo' `
-Property '*', @{
Name = 'StudentNo'
Expression = { [Int32] $_.StudentNo }
}
$roomsByStudentNo = GetRoomsByStudentNoTable
$combined = $students `
| Select-Object -Property `
'StudentNo', `
'PreferredSurname', `
'PreferredFirstnames', `
'UPN', `
@{
Name = "roomName";
Expression = {
$room = $roomsByStudentNo[$_.StudentNo]
if ($room -ne $null)
{
return $room.roomName
}
#TODO: Return text indicating room not found?
}
}
You can ameliorate the hit of building $roomsByStudentNo
by doing so at the same time as importing rooms.csv
...
# Replace the [String] property "students" with an array of [HashTable] property "Students"
$rooms = Import-Csv rooms.csv `
| Select-Object `
-ExcludeProperty 'students' `
-Property '*', @{
Name = 'Students'
Expression = {
$studentsText = $_.students
$studentsScriptBlock = Invoke-Expression -Command $studentsText
$studentsArray = @(& $studentsScriptBlock)
return $studentsArray
}
} `
| ForEach-Object -Begin {
$roomsByStudentNo = @{ }
} -Process {
foreach ($student in $_.Students)
{
#NOTE: It is assumed each student belongs to at most one room
$roomsByStudentNo[$student.StudentNo] = $_
}
return $_
}
*Except for on small arrays