4

I want to cut everything with a delimiter ":" The input file is in the following format:

data1:data2
data11:data22
...

I have a linux command

cat merged.txt | cut -f1 -d ":" > output.txt

On mac terminal it gives an error:

cut: stdin: Illegal byte sequence

what is the correct way to do it on a mac terminal?

Darth Vader
  • 79
  • 2
  • 8
  • 1
    What's the output of `file merged.txt`? – Benjamin W. May 01 '19 at 18:49
  • second part after colon – Darth Vader May 01 '19 at 18:54
  • 1
    This would appear to be an issue with your input file, not `cut` itself. – chepner May 01 '19 at 18:55
  • check https://stackoverflow.com/questions/19242275/re-error-illegal-byte-sequence-on-mac-os-x – P.... May 01 '19 at 19:01
  • 1
    No, I meant what is the output of the command `file` applied to `merged.txt`. It'll tell you what your system thinks that file is (ASCII or not, carriage returns etc.) – Benjamin W. May 01 '19 at 19:04
  • Sorry, i don't quite understand the question. The merged.txt file has strings detached with colon. I need to cut everything before the colon and the colon itself and output everything after it in a new file. – Darth Vader May 01 '19 at 19:13
  • Possible duplicate of [RE error: illegal byte sequence on Mac OS X](https://stackoverflow.com/questions/19242275/re-error-illegal-byte-sequence-on-mac-os-x) – Bruno Leveque May 01 '19 at 20:48

2 Answers2

3

Your input file (merged.txt) probably contains bytes/byte sequences that are not valid in your current locale. For example, your locale might specify UTF-8 character encoding, but the file be in some other encoding and cannot be parsed as valid UTF-8. If this is the problem, you can work around it by telling tr to assume the "C" locale, which basically tells it to process the input as a stream of bytes without paying attention to encoding.

BTW, cat file | is what's commonly referred to as a Useless Use of Cat (UUOC) -- you can just use a standard input redirect < file instead, which cleaner and more efficient. Thus, my version of your command would be:

LC_ALL=C cut -f1 -d ":" < merged.txt > output.txt

Note that since the LC_ALL=C assignment is a prefix to the tr command, it only applies to that one command and won't mess up other operations that should assume UTF-8 (or whatever your normal locale is).

Gordon Davisson
  • 118,432
  • 16
  • 123
  • 151
0

Your cut command works for me on my Mac, you can try awk for the same result

awk -F: '{print $1}' merged.txt

data1
data11
Adam vonNieda
  • 1,635
  • 2
  • 14
  • 22