0

I am writing a command (to run via manage.py importfiles) to import a given directory structure on the real file system in my self written filestorage in Django.

def _handle_directory(self, directory_path, directory):
    for root, subFolders, files in os.walk(directory_path):
        for filename in files:
            path = os.path.join(root, filename)
            with open(path, 'r') as f:
                file_wrapper = FileWrapper(f)
                self.cnt_files += 1
                new_file = File(directory=directory, filename=filename,
                                file=file_wrapper, uploader=self.uploader)
                new_file.save()

The full model can be found at GitHub. The full command is currently on gist.github.com available.

If you do not want to check the model: the attribute file of my File class is a FileField.

Copying the files seems to work, thanks to pajton. Nevertheless I receive a new exception, I think, there's a problem with the sqlite encoding. But I do not know how to fix it. The value of sys.getfilesystemencoding() is mbcs.

Traceback (most recent call last):
  File ".\manage.py", line 10, in <module>
    execute_from_command_line(sys.argv)
  File "C:\Python27\lib\site-packages\django\core\management\__init__.py", line 399, in execute_from_command_line
    utility.execute()
  File "C:\Python27\lib\site-packages\django\core\management\__init__.py", line 392, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "C:\Python27\lib\site-packages\django\core\management\base.py", line 242, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "C:\Python27\lib\site-packages\django\core\management\base.py", line 285, in execute
    output = self.handle(*args, **options)
  File "D:\Development\github\Palco\engine\filestorage\management\commands\importfiles.py", line 63, in handle
    self._handle_directory(args[0], root)
  File "D:\Development\github\Palco\engine\filestorage\management\commands\importfiles.py", line 75, in _handle_directory
    new_file.save()
  File "D:\Development\github\Palco\engine\filestorage\models.py", line 155, in save
    super(File, self).save(*args, **kwargs)
  File "C:\Python27\lib\site-packages\django\db\models\base.py", line 545, in save
    force_update=force_update, update_fields=update_fields)
  File "C:\Python27\lib\site-packages\django\db\models\base.py", line 573, in save_base
    updated = self._save_table(raw, cls, force_insert, force_update, using, update_fields)
  File "C:\Python27\lib\site-packages\django\db\models\base.py", line 635, in _save_table
    forced_update)
  File "C:\Python27\lib\site-packages\django\db\models\base.py", line 679, in _do_update
    return filtered._update(values) > 0
  File "C:\Python27\lib\site-packages\django\db\models\query.py", line 507, in _update
    return query.get_compiler(self.db).execute_sql(None)
  File "C:\Python27\lib\site-packages\django\db\models\sql\compiler.py", line 976, in execute_sql
    cursor = super(SQLUpdateCompiler, self).execute_sql(result_type)
  File "C:\Python27\lib\site-packages\django\db\models\sql\compiler.py", line 782, in execute_sql
    cursor.execute(sql, params)
  File "C:\Python27\lib\site-packages\django\db\backends\util.py", line 69, in execute
    return super(CursorDebugWrapper, self).execute(sql, params)
  File "C:\Python27\lib\site-packages\django\db\backends\util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "C:\Python27\lib\site-packages\django\db\utils.py", line 99, in __exit__
    six.reraise(dj_exc_type, dj_exc_value, traceback)
  File "C:\Python27\lib\site-packages\django\db\backends\util.py", line 53, in execute
    return self.cursor.execute(sql, params)
  File "C:\Python27\lib\site-packages\django\db\backends\sqlite3\base.py", line 450, in execute
    return Database.Cursor.execute(self, query, params)
django.db.utils.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str
). It is highly recommended that you instead just switch your application to Unicode strings.

I changed filename in several ways; but it is always wrong. I tried values like 'foo' or u'foo', too. :( Also different combinations of .encode(), .decode() and unidecode.

I am pretty sure, that's a problem with the filename. I printed the current values of filename and the exception occurs if the filename has non-ascii characters.

Update 1: I followed pajton's advice and logged the sql querys. This is the result: (The first line is the output of print filename). D:\temp\prak-gdv-abgabe is my argument to this command.

Eigene L÷sung.pdf
(0.000) QUERY = u'BEGIN' - PARAMS = (); args=None
(0.000) QUERY = u'INSERT INTO "filestorage_file" ("directory_id", "filename", "file", "size", "content_type", "uploader_id", "datetime", "sha512") VALUES (%s, %
s, %s, %s, %s, %s, %s, %s)' - PARAMS = (164, u'Eigene L\xf6sung.pdf', u'filestorage/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', None, None, 8, u'2014-02-26
23:21:17.735000', None); args=[164, 'Eigene L\xc3\xb6sung.pdf', u'filestorage/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', None, None, 8, u'2014-02-26 23:21:
17.735000', None]
(0.000) QUERY = u'BEGIN' - PARAMS = (); args=None
(0.000) QUERY = u'UPDATE "filestorage_file" SET "directory_id" = %s, "filename" = %s, "file" = %s, "size" = NULL, "content_type" = %s, "uploader_id" = %s, "date
time" = %s, "sha512" = NULL WHERE "filestorage_file"."id" = %s ' - PARAMS = (164, u'D:\\Temp\\prak-gdv-abgabe\\Protokoll\\Eigene L\ufffdsung.pdf', u'filestorage
/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', u'application/pdf', 8, u'2014-02-26 23:21:17.735000', 156); args=(164, 'D:\\Temp\\prak-gdv-abgabe\\Protokoll\\E
igene L\xf6sung.pdf', u'filestorage/5/5b/5bf32077-5531-4de0-95a7-d2ea3e10a17d.pdf', 'application/pdf', 8, u'2014-02-26 23:21:17.735000', 156)

Update 2: (2014-02-27 11:10 UTC) The encoding of my sqlite database is UTF-8 as verified by PRAGMA encoding;.

I checked the records of my database.

   Id   |   filename                                        |   sha512      |   size
    1   |   D:\Temp\prak-gdv-abgabe\Liesmich.html           |   ffeb8c3d5   |   5927
    2   |   D:\Temp\prak-gdv-abgabe\Liesmich.md             |   d206d241f   |   407
    3   |   D:\Temp\prak-gdv-abgabe\Liesmich.txt            |   d206d241f   |   407
    4   |   D:\Temp\prak-gdv-abgabe\Linux\GDV_Praktikum.bin |   5fc5749ee   |   166925
    5   |   Eigene Lösung.pdf                               |               |

It's very interessting, that the failing entry (id 5) has the expected filename but not the sha512 or the size values set. the other entries have the expected values for sha512 and size but not the expected filename. This is very interesting. It seems, the custom save()-method of my File class is part of my problem.... But I don't understand why these strange things happens.

Community
  • 1
  • 1
tjati
  • 5,761
  • 4
  • 41
  • 56
  • 1
    Can you post SQL that is failing? Check this question (2nd answer) to see how to log SQLs: http://stackoverflow.com/questions/2314920/django-show-log-orm-sql-calls-from-python-shell – pajton Feb 26 '14 at 11:00
  • Thank you for your advice. I added the information you asked for. In addition, I added the filesystemencoding to my questnion (`mbcs`). – tjati Feb 26 '14 at 22:24
  • I updated my question with further information which may help to understand what happens here. – tjati Feb 27 '14 at 11:06
  • Hm, looks strange, cause first INSERT goes fine. I see you're doing 4 queries in your save(). Can you update the params before and just make one call to super.save() at the end? Could just solve this – pajton Feb 27 '14 at 12:32
  • Well, does not help. Nevertheless, it's improves the performance, I think. But I wonder myself, why the second update has the full filename with path and not only the filename. This is really weird. – tjati Feb 27 '14 at 22:19
  • Hey, patjon, the journey is not over ;( https://stackoverflow.com/questions/22120478/django-copying-file-copies-only-a-part – tjati Mar 01 '14 at 21:15

1 Answers1

0

Well, I find a .... solution. I just improved my custom .save()-method of my File model. It fires not anymore 3+ saves but one. And - this is the important change - it updates only the three fields I check in my custom save method. My save method now looks like:

def save(self, *args, **kwargs):
    super(File, self).save(*args, **kwargs)
    do_update = False
    if not self.content_type:
        self.content_type = mimetypes.guess_type(self.file.name)[0]
        do_update = True
    if not self.sha512:
        self.sha512 = hashlib.sha512(self.file.read()).hexdigest()
        do_update = True
    if not self.size:
        self.size = self.file.size
        do_update = True

    if do_update:
        super(File, self).save(update_fields=['content_type', 'sha512', 'size'], *args, **kwargs)

Now the files are imported as expected!

tjati
  • 5,761
  • 4
  • 41
  • 56
  • 1
    Glad you made it! What I suggested in previous comment above was to move `super(File,self).save()` call at the very bottom and have just one such call. That could work too. – pajton Feb 28 '14 at 09:36
  • I need the first `super`-call, because if the file is new, I need to save it so run the file operations (`guess_type` and `sha512`). I know, it's very ugly to have multiple `save()`-calls, but in this case, it seems to be necessary. Of course, my solution is not really the solution as expected but it improves the code quality and now I have not anymore the problem. Well, this is working for, which is nice ;) – tjati Feb 28 '14 at 09:39
  • 1
    It is a bit strange, cause the file should be available regardless of whether you already saved...but maybe it needs to be accessed differently. Anyway, if you intend to keep the logic this way you could also take a look into signals. This update logic sounds like a very good place to use post_save signal. – pajton Feb 28 '14 at 09:45