4

I have a folder named SMITH_JOHN TAYLER_DAVID, I can only search SMITH or TAYLER but not any other ways, i.e.:

files().list(q="name contains 'SMITH'")  => OK
files().list(q="name contains 'TAYLER'") => OK
files().list(q="name contains 'JOHN'")   => No match!
files().list(q="name contains 'DAVID'")  => No match!
files().list(q="name contains 'MITH'")   => No match!

So looks like that I can only search word from the beginning of the folder name or after a space. This happens to you as well? What does "contains" mean in the REST API and how does the Python implemented this (perhaps using "match" instead of "search")?

puravidaso
  • 1,013
  • 1
  • 5
  • 22
  • Although I'm not sure whether I could correctly understand your question, I proposed an answer. Could you please confirm it? If I misunderstood your question and that was not useful, I apologize. – Tanaike Feb 07 '22 at 04:50

1 Answers1

2

So looks like that I can only search word from the beginning of the folder name or after a space. This happens to you as well?

In my environment, I confirmed the same situation with you. In your situation, the folder name is SMITH_JOHN TAYLER_DAVID. I thought that in this case, _ might be the reason for this issue. For example, when the folder name is SMITH JOHN TAYLER DAVID, your all search queries can retrieve the folder. It seems that the top letter and the letter after a space can be searched. I thought that this might be the current specification.

What does "contains" mean in the REST API and how does the Python implemented this (perhaps using "match" instead of "search")?

This is used as the search query. Ref This search query is used for the method of "Files: list".

For example, if you want to retrieve the folder of the folder name of SMITH_JOHN TAYLER_DAVID by searching the values of 'SMITH','TAYLER','JOHN','DAVID','MITH' instead of the search query of name='SMITH_JOHN TAYLER_DAVID', you can use the following flow.

  1. Retrieve the folder list using the search query of name contains 'SMITH' and mimeType='application/vnd.google-apps.folder' and trashed=false.
    • By this, the folder list includes SMITH.
  2. Retrieve the folder by searching the values of 'SMITH','TAYLER','JOHN','DAVID','MITH' from the retrieved folder list.

By this flow, the folder of SMITH_JOHN TAYLER_DAVID can be retrieved by one API call. When this flow is reflected in a sample script using googleapis for python, it becomes as follows.

Sample script:

drive_service = build('drive', 'v3', credentials=creds) # Please use your credential.

q = "name contains 'SMITH' and mimeType='application/vnd.google-apps.folder' and trashed=false"
response = drive_service.files().list(q=q, fields='files(id, name)', pageSize=1000).execute()
search = ['TAYLER', 'JOHN', 'DAVID', 'MITH']
for file in response.get('files'):
    name = file.get('name')
    if all(e for e in search if e in name):
        print(name)
  • When this script is run, the folder of SMITH_JOHN TAYLER_DAVID can be obtained.

  • In this sample, it supposes that the number of folders including the value of SMITH is less than 1000. Please be careful about this.

References:

Tanaike
  • 181,128
  • 11
  • 97
  • 165
  • I need to search using a substring I know but may not be at the beginning or after a space. In my SMITH_JOHN example, I only know the first name but do not know the last name, and I want to search by what I know. Of course, that was a simplified example. I could get ALL the names without no or less restriction, and then match them inside Python as you hinted, but that would be a huge waste of traffic and I seriously doubt if this is a Python implementation bug as "contains" should match a substring in any position. – puravidaso Feb 07 '22 at 14:31
  • @puravidaso Thank you for replying. About `if this is a Python implementation bug as "contains" should match a substring in any position.`, I'm not sure whether this is a bug or the current specification. I apologize for this. About `I need to search using a substring I know but may not be at the beginning or after a space.`, in this case, how about removing `name contains 'SMITH' and ` from `q`? In this case, if the number of all folders is less than 1000, one API call is used. If the number of all folders is less than 10000, 10 API calls are used. If this was not useful, I apologize. – Tanaike Feb 08 '22 at 00:08
  • 2
    I believe this is a bug as the [Drive API (v3)](https://developers.google.com/drive/api/v3/search-files) says `Files with a name containing the words "hello" and "goodbye"`, it did not say at the beginning or after a space. To find what the word "contain" means, one does not need to look any further but Python itself: `"SMITH".__contains__("MITH")` returns TRUE! What you suggested was a valid workaround, but it is very resource intensive. – puravidaso Feb 08 '22 at 00:25
  • @puravidaso Thank you for replying. About `What you suggested was a valid workaround, but it is very resource intensive.`, this is due to my poor skill. I deeply apologize for this. I would be grateful if you can forgive my poor skill. I think that I have to study more. – Tanaike Feb 08 '22 at 00:37
  • 2
    It is not a problem with your skill, I believe it is a bug. I have created a bug report [here](https://github.com/googleapis/google-api-python-client/issues/1685). – puravidaso Feb 08 '22 at 00:58
  • @puravidaso Thank you for reporting it. – Tanaike Feb 08 '22 at 01:01
  • 1
    I did further testing and discovered that if the match starts from beginning, it can either match the whole word ("SMITH") or the the partial word ("SMIT"), otherwise, it matches the whole word only. The match is case insensitive. The result is the same for Python API and dash API, I am going to report as an API issue. – puravidaso Feb 13 '22 at 02:20
  • 1
    The API issue report is https://issuetracker.google.com/issues/219091193 – puravidaso Feb 13 '22 at 02:45
  • 1
    I received update on the API bug report that it is the spec so it won't be fixed, though I do not understand the rationale. Given that, the answer plus the comments form a complete answer for the question, so I will accept the answer. – puravidaso Feb 18 '22 at 02:25
  • 1
    @puravidaso Thank you for replying. I saw your issue tracker. For example, how about reporting it as a future request? In this case, I thought that it might possibly be considered with a future update. – Tanaike Feb 18 '22 at 02:41
  • 1
    Feature request created: https://issuetracker.google.com/issues/220326928 – puravidaso Feb 19 '22 at 03:35
  • @puravidaso I added a start to it. – Tanaike Feb 19 '22 at 04:51