1

I am trying to read a string from a ZIP file which contains n number of files. If the string is present in the file, that file has to be moved to a specific location.

import zipfile,os,shutil

f = []
files = 'Contains given substring'
os.chdir(r'C:\Users\Vishali\Desktop\PY\POC')

archive = zipfile.ZipFile('PY.zip')
print(archive.namelist())

for n in archive.namelist():
    print(n)

    f1 = archive.open(n,'r')
    re = f1.readlines()
    print(files)
    print(re)
    if files in re:
        shutil.copy(n,r'C:\Users\Vishali\Desktop\PY\s')
        f.append(f1)

print(f)

However, if the string is present in a file, it is not getting detected. f remains an empty list.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Lucazade
  • 57
  • 5
  • What do you want to check for? If `re` contains a string that contains the given substring or if one of the strings in `re` _exactly equals_ the `files` string? Or maybe something else? – ForceBru Aug 19 '19 at 18:11
  • if 'Zap.zip' is my zip file name and it contains 3 files named 'first.txt','second.txt' and 'third.txt'. i want to check which file contains the string i am searching for . For eg , if i am searching for string 'hello', which is present in the file 'second.txt' , i want to print the file name that contains the string and also move the same file to a specific location – Lucazade Aug 19 '19 at 18:15
  • Currently `files in re` will check if the exact string is contained within the `re` list, it is not a substring match – C.Nivs Aug 19 '19 at 18:16
  • @nameless13, then you can just do `if files in f1.read()` – ForceBru Aug 19 '19 at 18:19
  • @forcebru : the error i faced when i used read() . 'if files in f1.read():TypeError: a bytes-like object is required, not 'str' ' . r = f1.readlines() returns a list and i am not able to find a way to find a string in that list – Lucazade Aug 19 '19 at 18:22
  • @nameless13, then `files` should be `bytes`, like: `files = b"the actual thing"` – ForceBru Aug 19 '19 at 18:24
  • Rename your variables. The names you've given them do not appear to reflect what they represent. This makes understanding the intentions of your code much more difficult. – jpmc26 Aug 19 '19 at 18:25
  • 3
    I have used this question as an example in [a discussion](https://meta.stackoverflow.com/q/388663/1394393) about a common, larger issue facing this community. – jpmc26 Aug 19 '19 at 22:41
  • It is not clear how you want to handle line endings. Are newline characters forbidden in your search string? Or can a search string include them and match across multiple lines? If they can include them, do they have to match exactly, or do you need to normalize them somehow? – jpmc26 Aug 20 '19 at 01:05
  • 2
    Please in code questions give a [mre]--cut & paste & runnable code; example input with desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. – philipxy Aug 20 '19 at 08:01

1 Answers1

-1

"re" is a list. I am incorporating feedback from @jpmc26 to my original answer.

Change this:

if files in re:
    shutil.copy(n,r'C:\Users\Vishali\Desktop\PY\s')
    f.append(f1)

to this:

decode = ''
for lines in re:
    decode = decode + lines.decode('utf-8')
if files in decode:
    shutil.copy(n,r'C:\Users\Vishali\Desktop\PY\s')
    f.append(f1)

This properly decodes the lines retrieved by zipfile (if the file has UTF-8 encoding) and will eliminate escape characters from your search that otherwise could have caused false positives.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
brazosFX
  • 342
  • 1
  • 11
  • 2
    `str` is not the proper mechanism to combine a list of strings. – jpmc26 Aug 19 '19 at 18:26
  • Sure is sufficient to evaluate the truth of an if statement though. The answer is correct. – brazosFX Aug 19 '19 at 18:27
  • 2
    It introduces extra characters that are not part of the original content, such as quotes and commas. This can result in a false positive. It is not correct. – jpmc26 Aug 19 '19 at 18:28
  • @jpmc26 - I am assuming you are right. I see the extra output in the string. I will update the answer or comment with a more ideal way once I find it. Thnks. Or add to comment and save me the search? ...and give my vote back! I am a new contributor and you are supposed to be nice! Ha ha – brazosFX Aug 19 '19 at 18:37
  • 1
    You don't need to take my word for it. Just observe the output of `str` on a list in the REPL. – jpmc26 Aug 19 '19 at 18:40
  • 2
    The results are even worse if the file contains non-ASCII, control characters, or backslashes. It generates escape sequences in the string you're checking against. For example, `str(['\\'])` doubles the slashes in the repr. – jpmc26 Aug 19 '19 at 20:21
  • @jpmc26, updated answer after some research. Upvote if you approve, otherwise please comment. Thx for the help. – brazosFX Aug 19 '19 at 22:24
  • 2
    Dropping the newlines can also create false positives. Consider searching for `'abc'` when the file contains `'ab\nc'`. Also, concatenation is [generally not a good way of combining many strings](https://stackoverflow.com/a/52561012/1394393). You may also want to see some things I noted in the Meta discussion I linked above. – jpmc26 Aug 19 '19 at 23:12