-2

I'm using Python on VSC on a project, it requires me to hash the whole file (46KB with 5579 lines). However, on VSC, it only shows the hash table of the last 1012 lines. I didn't know what was happening and I could not fix it. This is my code:

import hashlib
def SHA1_hash(string):
    hash_obj = hashlib.sha1(string.encode())
    return(hash_obj.hexdigest())

with open("Project file/dict.txt") as f: 
    for line in f.readlines():
        print(SHA1_hash(line))

Picture of my text file starting from "writings":

The output should starts with:

5AD930D43A7851DC6649558BA6BEDD44F14E737C
(hash SHA-1 of "writings")

However, it is like this:

Picture of output terminal, the output started with a different hash string:

Why is this happening? Why am I getting the wrong hash as output?

  • 1
    Did you try to read the file as binary data so you don't have to encode the string? – Matthias Apr 14 '22 at 22:57
  • 1
    You might be running into the VSC terminal's line length limit. Try outputting to a file. – SuperStormer Apr 14 '22 at 22:57
  • 5
    why don't you edit your question of 2h ago, don't post images of text screen shots, do you expect me to type them to try to reproduce. Code/Text should be put in code blocks – rioV8 Apr 14 '22 at 22:58
  • 1
    prev question: https://stackoverflow.com/q/71877502/9938317 – rioV8 Apr 14 '22 at 23:02
  • Please attach the text file. Sometimes a few weird characters are thrown in and end up altering the hash. Try running `.strip()` on the string before encoding it. – Xiddoc Apr 14 '22 at 23:06
  • You're not hashing the file correctly. Instead of just reposting your question, see [the duplicate](https://stackoverflow.com/questions/22058048/hashing-a-file-in-python) which I linked in your previous question. – aneroid Apr 15 '22 at 00:10
  • Please don't post images of code, data, or Tracebacks. Copy and paste it as text then format it as code (select it and type `ctrl-k`) ... [Discourage screenshots of code and/or errors](https://meta.stackoverflow.com/questions/303812/discourage-screenshots-of-code-and-or-errors)...[Why not upload images of code on SO when asking a question?](https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question) ... [You should not post code as an image because:...](https://meta.stackoverflow.com/a/285557/2823755) – wwii Apr 15 '22 at 00:24
  • @aneroid That hashes the whole file at once, instead of each line. – Kelly Bundy Apr 15 '22 at 00:26
  • 1
    the tool to use is a debugger, you can step line by line and inspect the content of the variables, **A VERY HANDY TOOL**, a lot faster than letting other people do the work – rioV8 Apr 15 '22 at 02:21
  • 1
    `string` is a buildin module, not very handy to name your variable that – rioV8 Apr 15 '22 at 02:23
  • 1
    for an old school `print` debugging, add the line `print(repr(line))` to the `for` loop, and to limit the output during debugging `for line in f.readlines()[:5]:` – rioV8 Apr 15 '22 at 10:51
  • 1
    another tip: Don't use spaces in a filename. One day it will bite you. – rioV8 Apr 15 '22 at 10:55

1 Answers1

0

Because of the \n after every line(except last line), I can not get 5AD930D43A7851DC6649558BA6BEDD44F14E737C of writings.

Take print(SHA1_hash(line.strip())) to replace print(SHA1_hash(line)) can solve the problem.

But I must admit I can not reproduce your outputs, could you check the string in the dict.txt file?

Steven-MSFT
  • 7,438
  • 1
  • 5
  • 13
  • `SHA1_hash('writings')` gives you `5ad93.......` – rioV8 Apr 15 '22 at 09:41
  • @rioV8 Yes it is, I think he has some indent or space in the txt file. – Steven-MSFT Apr 15 '22 at 10:11
  • I once had the problem in a text file to find some text, turned out there where `zero-width spaces` in the file, you don't see them in the regular editor but with the hex editor it was clear, and `print(repr(line))` was also helpful – rioV8 Apr 15 '22 at 10:54
  • @rioV8 perfect. – Steven-MSFT Apr 15 '22 at 11:05
  • Thank you so much. I think this is the answer, about the dict.txt, I did not copy the whole file with more than 5000 lines, which made a misunderstanding, so sorry for that. – Nghia Nguyen Apr 15 '22 at 16:38
  • @NghiaNguyen A F* simple `print(i)` in your program would have shown you: you only process 100 lines and not 5000 – rioV8 Apr 16 '22 at 13:34
  • @rioV8 What is `i`? And how did they get output for 1012 lines when they only process 100? – Kelly Bundy Apr 16 '22 at 15:39
  • @KellyBundy `i` is the result of a use of `enumerate` and 100 is just an example number < 5000, then `i` would show `0..1011` and not `0...4999`. An indication you could try to solve yourself instead of letting others spend time trying to solve your .......... – rioV8 Apr 16 '22 at 17:00
  • @rioV8 Solve my what? – Kelly Bundy Apr 16 '22 at 17:08