0

I wrote a Pydrive script which downloads all the files in a specific folder.

The docs get downloaded as 'sampleTitle.md' with the mimetype of 'text/plain'.

then they simply get commited and pushed to my repo.

Here is my python code for pydrive:

def checkFile(arg):
    if arg['mimeType'] in mimetypes:
        downloadFile(arg)
        print('The file ' + str(arg['title']) + ' has a mimetype of ' + arg['mimeType'] + ' and will be downloaded')
        return
    if arg['mimeType'] in folder:
        enterFolder(arg['id'])
        print('The file ' + str(arg['title']) + ' has a mimetype of ' + arg['mimeType'] + ' and will be entered')
    return

def enterFolder(query):
    file_list = drive.ListFile({'q': '\'' + query + '\' in parents and trashed=false'}).GetList()
    for file1 in file_list:
        checkFile(file1)
    return

def downloadFile(arg):
   download_mimetype = None
   download_mimetype = mimetypes[arg['mimeType']]
   arg.GetContentFile(arg['title'], mimetype=download_mimetype)
   print(arg['title'] + 'got downloaded')
   return

import sys
sys.path.insert(1, '/Library/Python/2.7/site-packages')

from pydrive.auth import GoogleAuth

gauth = GoogleAuth()
gauth.LocalWebserverAuth() # Creates local webserver and auto handles authentication.

from pydrive.drive import GoogleDrive

mimetypes = {
    # Drive Document files as plain text.
    'application/vnd.google-apps.document': 'text/plain'
    # etc.
}

folder = {
    # Comparing for folder.
    'application/vnd.google-apps.folder': 'true'
    # etc.
}

# Create GoogleDrive instance with authenticated GoogleAuth instance.
drive = GoogleDrive(gauth)
# Auto-iterate through all files that matches this query

enterFolder('starfolder')

The code works and the files are downloaded.

In google docs sthe start of a file looks like this:

---  
layout: post
title: title
---

Its a YAML front matter which i need for jekyll and github pages.

When I download the file and push it to my repo it looks like this:

·---  
layout: post
title: title
---

I really dont know where that centered dot gets entered. It only appears on github and is hidden in all of my editors.(Atom, Textwrangler, Brackets, TextEdit, VisualStudio Code). It seems that when I hit backspace where the dot should be in the editor it removes the hidden dot. In Nano it is shown as whitespace.

I have to remove the whitespace somehow because it disrupts my markdown format. Is there an effective solution ?

Edit

I found the culprit its a BOM which gets set at the start of the document. I try now to remove it using a shell command but i cant find one which works I tried the following with example:

awk '{if(NR==1)sub(/^\xef\xbb\xbf/,"");print}' text.md > text.md
sed '1 s/\xEF\xBB\xBF//' < text.md > text.md

They remove the complete content of the files instead of only the BOM.

So doe anyone knows what I do wrong with the command line because everyone else seems to get the command working.

Community
  • 1
  • 1
Opaldes
  • 193
  • 1
  • 10
  • Possible duplicate of [Using awk to remove the Byte-order mark](http://stackoverflow.com/questions/1068650/using-awk-to-remove-the-byte-order-mark) – Waylan Jul 05 '16 at 19:51
  • I will look into it tomorrow the awk thing simply deletes my file so lets see if the slightly different version will work – Opaldes Jul 05 '16 at 22:12
  • Ok my script works now after downloading the files without extension and using the commands above to add an extension afterwards. – Opaldes Jul 06 '16 at 13:31

1 Answers1

0

When a file with mimetype "application/vnd.google-apps.document" is downloaded as "text/plain" a BOM gets inserted.

This BOM seems to be interpreted as a whitespace in nano and · in github.

The following command for removing BOMs works when the data gets renamed.

not working:

awk '{if(NR==1)sub(/^\xef\xbb\xbf/,"");print}' text.md > text.md

working for me:

awk '{if(NR==1)sub(/^\xef\xbb\xbf/,"");print}' text > text.md
Opaldes
  • 193
  • 1
  • 10