-1

I'm merging some tab delimited files and the printed output is incorrect but if I access the string in a REPL it looks fine. Here's how it looks:

fh=open('out.vcf')
for line in fh:
     i+=1
     if i == 29401:
             print(line)

 
AAEX03025909.1  1068    .   T   C                       0   42  5

Then looking at it without print:

line
'AAEX03025909.1\t1405\t.\tC\tT\t\t\t\t\t\t0\t0\t0\t0\t0\t0\t0\t0\t10\t9\n'

When I look at out.vcf in less, it looks like the output of print. Why am I getting different outputs? I want the string that is produced without print. Using a comma instead of a tab solves the problem, but I'd like to keep it as tab delimited

econ
  • 19
  • 2
  • 2
    Simple rule: if you want to print something you use `print()`. On the other hand, if you want to inspect a variable for debugging in interactive mode you can just enter the name to get a *technical representation* of the value. – Klaus D. Jun 21 '21 at 04:18

1 Answers1

0

there's always going to be some difference between how data is represented and how it's stored; practically, the values are stored as binary, but represented depending on the encoding .. in this case, you're seeing \t (ASCII character 9) represented both ways

print() will show the file with its encoding (which you can change), while simply echoing the file will show you the Python repr() interpretation

>>> "\t"
'\t'
>>> ord("\t")
9
>>> print("\t")

>>> repr("\t")
"'\\t'"
>>> print(repr("\t"))
'\t'
ti7
  • 16,375
  • 6
  • 40
  • 68