0

How do I copy all PDF files from a directory, and it's subdirectories, to a single directory?

there are actually many more files, and of of somewhat arbitrary depth. It's fair to assume maximum depth of four directories.

I suppose the files need to renamed, in the event that a.pdf, for example, is in several directories. Because I'll be adding the files to Calibre, duplicates are preferred over leaving out files. (Not looking to check files against each other for duplicates.)

Following KISS principles:

PS /home/nicholas/to> 
PS /home/nicholas/to> Copy-Item -path "/home/nicholas/from" -include "*.pdf" -Destination "/home/nicholas/to"
PS /home/nicholas/to> 
PS /home/nicholas/to> ls /home/nicholas/to
PS /home/nicholas/to> 
PS /home/nicholas/to> ls /home/nicholas/from
one  two
PS /home/nicholas/to> 
PS /home/nicholas/to> tree /home/nicholas/from
/home/nicholas/from
├── one
│   ├── a.pdf
│   ├── b.pdf
│   └── foo.txt
└── two
    ├── bar.txt
    ├── c.pdf
    └── d.pdf

2 directories, 6 files
PS /home/nicholas/to> 

Obviously, the above attempt fails to traverse into sub-directories, and doesn't deal with name clashes.

Probably makes sense to rename each PDF as it's copied. The recurse flag seems useful:

PS /home/nicholas/to> 
PS /home/nicholas/to> ls
PS /home/nicholas/to> 
PS /home/nicholas/to> Copy-Item -Path "/home/nicholas/from" -Destination "/home/nicholas/to" -Recurse
PS /home/nicholas/to> 
PS /home/nicholas/to> tree
.
└── from
    ├── one
    │   ├── a.pdf
    │   ├── b.pdf
    │   └── foo.txt
    └── two
        ├── bar.txt
        ├── c.pdf
        └── d.pdf

3 directories, 6 files
PS /home/nicholas/to> 

not sure how filter out txt files and put everything into a single directory, however.

Copying all PDF files with some success:

PS /home/nicholas/to> 
PS /home/nicholas/to> ls
PS /home/nicholas/to> 
PS /home/nicholas/to> tree /home/nicholas/from/                                                                                  
/home/nicholas/from/
├── one
│   ├── a.pdf
│   ├── b.pdf
│   └── foo.txt
└── two
    ├── bar.txt
    ├── c.pdf
    └── d.pdf

2 directories, 6 files
PS /home/nicholas/to> 
PS /home/nicholas/to> Get-ChildItem /home/nicholas/from -File -Recurse | Copy-Item -Destination /home/nicholas/to -filter '*.pdf'
PS /home/nicholas/to> 
PS /home/nicholas/to> tree
.
├── a.pdf
├── bar.txt
├── b.pdf
├── c.pdf
├── d.pdf
└── foo.txt

0 directories, 6 files
PS /home/nicholas/to> 

but how do I add some logic to rename and increment the files with a pattern like 1.pdf, 2.pdf, etc?

Looking to "merge" folders with PDF's to a single directory.

2 Answers2

1

You're on the right track for the most part:

$PDFs = "C:\"
$i = 1

Get-ChildItem -Path $PDFs -Filter "*.pdf" -Recurse | ForEach-Object -Process {
    Copy-Item $_.FullName -Destination "C:\NewFileDir" -Verbose}
        
Start-Sleep 3

Get-ChildItem -Path C:\NewFileDir -File "*.pdf" -Recurse | ForEach-Object -Process {
    Rename-Item $_.FullName -NewName $("$_{0}.pdf" -f $i++) -Verbose}
Abraham Zinala
  • 4,267
  • 3
  • 9
  • 24
  • 1
    This will lead to name collisions when files with the same name are already copied to the destination folder. Renaming afterwards won't help there.. – Theo Feb 20 '21 at 16:23
  • @Theo, what would be a better approach then? Renaming before hand? I'd suggest copying unique ones but, its the file names that are the same, not the contents. – Abraham Zinala Feb 20 '21 at 16:58
  • Unless we were to rename the unique ones first, then copy over? Or vise-versa. – Abraham Zinala Feb 20 '21 at 17:00
  • Edit: See @Theos, function. – Abraham Zinala Feb 20 '21 at 17:05
  • I'm adding these to `calibre` and so using `calibre` to weed through the duplicates. Would rather have duplicates than to miss files. At least in this scenario. Pruning files seems a seperate process. – Nicholas Saunders Feb 21 '21 at 01:19
0

Mostly works:

nicholas@mordor:~/powershell/files$ 
nicholas@mordor:~/powershell/files$ pwsh copy_pdfs.ps1 
Copy-Item: /home/nicholas/powershell/files/copy_pdfs.ps1:9
Line |
   9 |      Copy-Item -path $pdf -Destination /home/nicholas/to/$i.pdf
     |      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | Cannot retrieve the dynamic parameters for the cmdlet. The specified wildcard character pattern is not valid: The
     | possible origins of 2019-nCoV coronavirus [DOI 10.13140@RG.22.21799.29601] [originsof2019-n

Copy-Item: /home/nicholas/powershell/files/copy_pdfs.ps1:9
Line |
   9 |      Copy-Item -path $pdf -Destination /home/nicholas/to/$i.pdf
     |      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     | Cannot retrieve the dynamic parameters for the cmdlet. The specified wildcard character pattern is not valid: The
     | possible origins of 2019-nCoV coronavirus [DOI 10.13140@RG.22.21799.29601] [originsof2019-n

done
nicholas@mordor:~/powershell/files$ 
nicholas@mordor:~/powershell/files$ cat copy_pdfs.ps1 





$file = Get-ChildItem /home/nicholas/pdfs -filter *.pdf -recurse 
$i = 1                           
foreach ($pdf in $file) {            
    Copy-Item -path $pdf -Destination /home/nicholas/to/$i.pdf
    $i++              
}

$file = Get-ChildItem -filter *.pdf -recurse 




write-host "done"
nicholas@mordor:~/powershell/files$ 

criticism or alternate solutions appreciated. Thanks to weq on IRC for the logic.