7

I've got a huge XML file (0.5 GB), with no line breaks. I want to be able to look at, say, the first 200 characters without opening the whole file. Is there a way to do this with PowerShell?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jonny Cundall
  • 2,552
  • 1
  • 21
  • 33
  • 1
    looks to me that get-content is going to be effectively loading the whole file, so that's not what I'm looking for - unless there's some lazy evaluating magic in gc that I can't find any documentation for. – Jonny Cundall Sep 21 '13 at 17:38
  • [This answer](http://stackoverflow.com/a/11010158/2707864) to http://stackoverflow.com/questions/1001776/how-can-i-split-a-text-file-using-powershell can be used as a basis. It might work faster than [this answer](http://stackoverflow.com/a/18936628/2707864) below if the fragment to extract is large. This is a conclusion that I obtained from non-systematic tests. Try it as you see fit. – sancho.s ReinstateMonicaCellio Apr 14 '16 at 21:47

5 Answers5

23

PowerShell Desktop (up to 5.1)

You can read at the byte level with Get-Content like so:

$bytes = Get-Content .\files.txt -Encoding byte -TotalCount 200
[System.Text.Encoding]::Unicode.GetString($bytes)

If the log file is ASCII you can simplify this to:

[char[]](Get-Content .\files.txt -Encoding byte -TotalCount 200)

PowerShell Core 6.0 and newer

PowerShell Core doesn't support byte encoding. It's been replaced by -AsByteStream parameter.

$bytes = Get-Content .\file.txt -AsByteStream -TotalCount 200
[System.Text.Encoding]::Unicode.GetString($bytes)
Keith Hill
  • 194,368
  • 42
  • 353
  • 369
1

Copying binary files via powershell commandlets tend to be a bit slow. You may, however, run the following commands from powershell to get a decent performance:

cmd /c copy /b "large file.ext" "first n.ext"
FSUTIL file seteof "first n.ext" $nbytes

Tested in Win 10 PS 5.1
Result: 1.43GB processed in 4 seconds

Zimba
  • 2,854
  • 18
  • 26
0

Get-Content takes a -ReadCount option so you can take only the first X lines.

If you really want character granularity, you'll need to use one of the [IO.File]::Read methods from .NET

Eris
  • 7,378
  • 1
  • 30
  • 45
0

@keith-hill got me most of the way there.

Here's what I used to get the first character out of a VMware Virtual Disk. There is important information in the first 1000 or so characters, but I'd never get at it trying to open a 30GB file.

$bytes = Get-Content .\VMwareVirtualDiskFile.vmdk -Encoding byte -TotalCount 1000
[String]::Concat([char[]]($bytes))
Kevin Scharnhorst
  • 721
  • 1
  • 7
  • 14
-2

(get-content myfile).Substring(0,x)

Where x is the number of characters you want from each line e.g. $lines = (get-content myfile).Substring(0,10) will return an array of strings where each member of the array contains the first 10 characters of each line in myfile.

Pygar
  • 1
  • Welcome to stack overflow. Please consider formatting you code differently than your text. You can use ` ` to wrap your code – sao Jan 24 '20 at 14:08
  • this does not answer the original question, they wanted the first X bytes of the entire file, not per line. this method is also extremely inefficient for large files which was part of the original question. – Justin May 28 '20 at 19:47