0

I'm having and tedious problem with my shell script. It copies a file from another server to its. The trouble is here:

The file to be copied has a special char in his name, like this: "CDACampaña". But when I open my file using vi command it looks like CDACampaña.txt (When using cat command it looks correctly), and when I run the log shows that my file has no more after CDACampa...

My file code example (this is not functional, jut for understanding)

#Local machine
blabla code
cp //remote/CDACampaña.txt localfolder
bleble code

#Unix server vi command
blabla code
cp //remote/CDACampaña.txt localfolder
bleble code

#Unix log
blabla code
cp //remote/CDACampa

I tried uploading my shell as UTF8, UTF8 WO BOOM, ANSI, and with UNIX or WINDOWS eol, but nothing its working.

Please, any idea?

EDITTED:

Unix locale:

LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Local-pc encoding:

IsSingleByte      : True
BodyName          : iso-8859-1
EncodingName      : Europeo occidental (Windows)
HeaderName        : Windows-1252
WebName           : Windows-1252
WindowsCodePage   : 1252
IsBrowserDisplay  : True
IsBrowserSave     : True
IsMailNewsDisplay : True
IsMailNewsSave    : True
EncoderFallback   : System.Text.InternalEncoderBestFitFallback
DecoderFallback   : System.Text.InternalDecoderBestFitFallback
IsReadOnly        : True
CodePage          : 1252
Wesley Romero
  • 51
  • 1
  • 7
  • 1
    You have failed to configure `vi` for UTF-8. – tripleee Oct 09 '18 at 15:46
  • i thought that, but what about when running? It truncates all after "CDACampa" – Wesley Romero Oct 09 '18 at 15:49
  • 1
    This is actually rather unclear, but the string is clearly UTF-8 which is viewed in some legacy 8-bit character set. Without information about the actual encodings involved, and/or the actual bytes in the file name, we can only speculate. Please [edit] your question to include pertinenet details; see the guidance in the [Stack Overflow `character-encoding` tag info page](http://stackoverflow.com/tags/character-encoding/info). What are your `locale` settings and what behavior do you observe on the remote server, and what locally? – tripleee Oct 09 '18 at 15:50
  • Quite possibly the `ftp` server is too old and crufty to log UTF-8 strings correctly. Which platform and whch version of `ftpd` or equivalent? – tripleee Oct 09 '18 at 15:53
  • If you want to get a plain ASCII encoding of a filename, `export LC_ALL=C; set -- CDACamp*.txt; printf '%q\n' "$@"`; the output will be usable through any 7-bit-clean channel. – Charles Duffy Oct 09 '18 at 15:55
  • 1
    Printing a UTF-8 character in a CP1252 terminal... well, yeah, that would cause your problem (in terms of how the output is printed, not any issue with the script itself). How about using a better terminal program? – Charles Duffy Oct 09 '18 at 16:02
  • I've editted the question showing unix locale and my local-pc encoding, also changed my "shell code", its not a ftp problem, that's why I changed it for copy command. – Wesley Romero Oct 09 '18 at 16:03
  • 1
    See https://stackoverflow.com/questions/19955385/utf-8-in-windows-7-cmd over on superuser. – Charles Duffy Oct 09 '18 at 16:04
  • 1
    ...and btw, I'm still waiting for the `LC_ALL=C; printf '%q\n' *.txt` output showing both local and remote names in unambiguous (not locale-dependent) form. That'll let us know if the names are making it across safely and just being displayed wrong, or actually getting munged. – Charles Duffy Oct 09 '18 at 16:05
  • BTW, to print the text of the remote script unambigously, you can use `printf '%q\n' "$( – Charles Duffy Oct 09 '18 at 16:08
  • @CharlesDuffy you mean this output ? `$ LC_ALL=C; $ printf '%q\n' /cert/bcp/xcom/emic/CDACampaña.txt; $'/cert/bcp/xcom/emic/CDACampa\361a.txt'` – Wesley Romero Oct 09 '18 at 16:09
  • Yes, that's helpful! `$'\361'` is the character also known as `$'\xf1'`, or the extended-ascii "latin small letter n with tilde" as documented at https://www.fileformat.info/info/unicode/char/f1/index.htm – Charles Duffy Oct 09 '18 at 16:14
  • ...so, you should be able to write `$'\361'` or `$'\xf1'` in your scripts in the place where the character needs to be to refer to it; as in `cp $'//remote/CDACampa\361a.txt' localfolder` – Charles Duffy Oct 09 '18 at 16:15
  • Using quotes right ? like this `$cp '//remote/CDACampa\361a.txt' localfolder` or `$cp '//remote/CDACampa\xf1a.txt' localfolder` – Wesley Romero Oct 09 '18 at 16:20
  • The `$` is part of the quoting for the string (not used to illustrate a prompt), and needs to be right before the `'`. See http://wiki.bash-hackers.org/syntax/quoting#ansi_c_like_strings for a description of the syntax. – Charles Duffy Oct 09 '18 at 16:20
  • ...you can also change quoting types in the middle of a string, so `cp //remote/CDACampa$'\361'a.txt localfolder` is also valid (most of the string being unquoted, then that one character being in an ANSI-C-like quoting context, then writing the rest of it in an unquoted context again). – Charles Duffy Oct 09 '18 at 16:23

1 Answers1

2

You can use printf '%q\n' CDACamp*.txt after setting LC_ALL=C to see how your filename would be rendered in a 7-bit-clean ASCII character set (thus, one which will render correctly on pretty much any terminal).

If the output is:

$'/cert/bcp/xcom/emic/CDACampa\361a.txt'

...that's a value you can put in your script (so long as it's run with #!/bin/bash or #!/usr/bin/env bash, not #!/bin/sh):

cp $'/cert/bcp/xcom/emic/CDACampa\361a.txt' localfolder

Because all the characters involved are 7-bit clean, they'll look the same whether or not your terminal supports Unicode (or extended ASCII) correctly.

Charles Duffy
  • 280,126
  • 43
  • 390
  • 441