0

In my program In have a log directory. Name of the log directory is very long so in my python script I used hash function to get the unique code and append it to fixed string ie:

LOG_DIR = "abcdefghijklmnopqrstuvwxyz"
 log_dir_hashed = hash(LOG_DIR)
 new_log_dir = "log_%s" %log_dir_hashed

Since I am new to python please tell me if anything can go wrong with above code? also How to do similar thing in shell script so that result of the python directory name and shell directory name obtained after hashing is same.

user1731553
  • 1,887
  • 5
  • 22
  • 33

2 Answers2

4

hash() is an implementation detail of python and __hash__ dunders can even override what it does, so you shouldn't be using it like that. It also has some possibly surprising properties, like:

# This is not a collision produced by the used hashing method, it is
# how hash() functions. The result though is a collision.
>>> hash(-2) == hash(-1)
True

Use a well known hash like MD5 or SHA1 etc. If you need cryptographically secure log dirs, choose a suitable hash based on that. Have a look at https://docs.python.org/3/library/hashlib.html. These have equivalent command line tools available.

For example:

from hashlib import md5

log_dir_hashed = md5('abcdefghijklmnopqrstuvwxyz'.encode('utf-8')).hexdigest()
new_log_dir = "log_%s" % log_dir_hashed

Comparing python:

>>> md5('abcdefghijklmnopqrstuvwxyz'.encode('utf-8')).hexdigest()
'c3fcd3d76192e4007dfb496cca67e13b'

and equivalent command line (one way to do it):

 % echo -n 'abcdefghijklmnopqrstuvwxyz' | md5sum - | awk '{print $1}'
c3fcd3d76192e4007dfb496cca67e13b
Community
  • 1
  • 1
Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
  • yes I forgot the quotes, have corrected the question. I am doing this hash just to reduce the length of filename... nothing big. – user1731553 Apr 20 '16 at 07:39
  • "This is not a collision, it is how hash() functions" - no, it is a collision. It may also be how `hash()` functions, but that doesn't make it not a collision. – user2357112 Apr 20 '16 at 07:43
  • 1
    @user2357112 It is not a collision in the sense of "the used hashing method produces the same value for these 2 values", but as in "this function can not return -1 as a valid return value". Yes, they will collide, but not in the traditional sense of hash collisions. – Ilja Everilä Apr 20 '16 at 07:45
  • Thanks for the solution! – user1731553 Apr 20 '16 at 09:58
  • md5.new("logdir".encode('utf-8')).hexdigest() gives: ee6da4c228cfaebfda7f14e4371a097d in python and `echo "logdir" | md5sum - | awk '{print $1}'` gives aba76197efa97e6bd4e542846471b391 in linux .they give me different result... can you please help @llja – user1731553 Apr 20 '16 at 20:12
  • 1
    You're missing `-n` from echo. It is adding a newline after "logdir". Compare your command line command to `md5.new("logdir\n").hexdigest()`. Btw as it seems you're using python 2, you don't have to `.encode()`, as strings in python 2 are byte strings to begin with. It'd be a different story if you'd use unicode strings (`u"logdir"` in python 2). – Ilja Everilä Apr 20 '16 at 20:20
-3

Hash is doing encryption to your directory, Basically converting your data to to MD5/SHA or other encryption.

You can use crypt(Data) in shell script to get the same results.

Eg.

log_dir_hashed=crypt(LOG_DIR)