I am attempting to solve the following problem:
Given a number of similarly formatted text files (~800mb worth of them, in my case), retrieve all lines in them and delete duplicates.
I attempted to solve this problem by running this command:
cat *.txt | Sort-Object -unique >output.txt
Then, powershell quickly consumed all my available RAM (over 16gb) and ran for over 20 minutes without writing anything into the output file.
I then ran cat *.txt >output.log
to rule out the possibility of shell reading the file it was writing to, but that command still maxed out all RAM and produced almost no output.
Why did this happen? How can 800mb of files on disk consume all RAM when concatenating?
How to solve this problem with powershell more efficiently?
Value of $PSVersionTable, if that helps:
Name Value
---- -----
PSVersion 5.1.19041.1682
PSEdition Desktop
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...}
BuildVersion 10.0.19041.1682
CLRVersion 4.0.30319.42000
WSManStackVersion 3.0
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
Thanks in advance.