0

Sorry if this doesn't make much sense, I'm not much of a programmer.

I am using PowerShell to concatenate all of the files within a folder into a single larger file, however when I do this, the text itself comes out 'corrupted'.

I have a folder of Ancient Greek texts that all end with a .tess extension, these files come from https://github.com/cltk/grc_text_tesserae/tree/master/texts (I'm not sure how this extension works, but it opens fine in Notepad). I used:

Get-Content *.tess | Set-Content greekcorpus.tess

However, the text would come out scrambled. For example:

Σιδὼν ἐπὶ θαλάττῃ πόλις

Comes out as:

Σιδὼν ἐπὶ θαλαÌττῃ ποÌλιÏ"

Anyone know what could be going wrong? Thanks!

Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
  • 1
    This would be a problem if you were using powershell 5.1 where `Get-Content` defaults to ANSI encoding. try `Get-Content thefiles* -Encoding utf8 | Set-Content mergedfiles -Encoding utf8` – Santiago Squarzon Jul 19 '23 at 00:41
  • Possible duplicate: [Using UTF-8 Encoding (CHCP 65001) in Command Prompt / Windows Powershell (Windows 10)](https://stackoverflow.com/a/57134096/1701026) – iRon Jul 19 '23 at 06:02

1 Answers1

-1

This should do the work :

Get-Content *.tess -Encoding UTF8
Civette
  • 386
  • 2
  • 10