0

I want to organize the logical path in s3. Currently it is something like:

a/01-2020/b/file.txt
a/01-2020/c/file2.txt
a/02-2020/b/file.txt
a/02-2020/c/file2.txt
...

For that I'm looking for regex to replace between the date in the second place (by / delimiter) with the third one

Should look something like that:

a/b/01-2020/file.txt
a/b/02-2020/file.txt
a/c/01-2020/file2.txt
a/c/02-2020/file2.txt
...

In python the code start like that:

s3_client = boto3.client('s3')
objs = s3_client.list_objects_v2(Bucket=bucket, Prefix=prefix)['Contents']

for key in objs:
     print(key['Key'])
     print(reverse(key['Key']))    <---- reverse() is just an example
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Nir
  • 2,497
  • 9
  • 42
  • 71

7 Answers7

2

You could try this messy double str.join:

>>> s = '''a/01-2020/b/file.txt
a/01-2020/c/file2.txt
a/02-2020/b/file.txt
a/02-2020/c/file2.txt'''
>>> print('\n'.join('/'.join([i.split('/')[0], i.split('/')[2], i.split('/')[1], i.split('/')[3]]) for i in s.splitlines()))
a/b/01-2020/file.txt
a/c/01-2020/file2.txt
a/b/02-2020/file.txt
a/c/02-2020/file2.txt
>>> 

Inside the comprehension, I reorder the lines using indexing.

U13-Forward
  • 69,221
  • 14
  • 89
  • 114
1

You can use regex too:

newkeys = [re.sub(r'/([\d-]+)/(\w+)/', r'/\2/\1/', x) for x in objs]   

However, a few points:

  1. Please see here why your listing might be incomplete and how to fix it.
  2. I would suggest also reverting your date naming so that the most significant part (year) comes first, and I would split it into %Y/%m for easier operation downstream.
Pierre D
  • 24,012
  • 7
  • 60
  • 96
1

Since you're working with paths, why not use pathlib instead of regex or purely string-manipulation based solutions?

from pathlib import Path

paths = (
    "a/01-2020/b/file.txt",
    "a/01-2020/c/file2.txt",
    "a/02-2020/b/file.txt",
    "a/02-2020/c/file2.txt"
)

rearranged = [Path(a, b, date, file) for a, date, b, file in map(lambda path: Path(path).parts, paths)]
Paul M.
  • 10,481
  • 2
  • 9
  • 15
0

If I understand what you are asking you can do this:

paths = '''a/01-2020/b/file.txt
a/01-2020/c/file2.txt
a/02-2020/b/file.txt
a/02-2020/c/file2.txt'''

root = 'b'

def inject_path(paths):
    paths = paths.split('\n')
    result = []
    for path in paths:
        path = ''.join(path)
        get_path = path.split('/')
        get_path.insert(1, root)
        result.append('/'.join(get_path))
    return result

my_paths = inject_path(paths)
for path in my_paths:
    print(path)

Output

a/b/01-2020/b/file.txt
a/b/01-2020/c/file2.txt
a/b/02-2020/b/file.txt
a/b/02-2020/c/file2.txt

EDIT

While Loop + String Concatenation

def inject_path(paths):
    paths = paths.split('\n')
    result = []
    while True:

        for idx2, path in enumerate(paths):
            for idx, char in enumerate(path):
                if char == '/':
                    result.append(path[:idx] + path[idx]+root + path[idx:])
                    break

        break
    return result

my_paths = inject_path(paths)

for path in my_paths:
    print(path)
Federico Baù
  • 6,013
  • 5
  • 30
  • 38
0
t = """a/01-2020/b/file.txt
a/01-2020/c/file2.txt
a/02-2020/b/file.txt
a/02-2020/c/file2.txt"""

"""
desired output:
a/b/01-2020/file.txt
a/b/02-2020/file.txt
a/c/01-2020/file2.txt
a/c/02-2020/file2.txt
"""

for s in t.splitlines():
    parts = s.split('/') # ['a', '01-2020', 'b', 'file.txt']
    to_move = parts.pop(2) # 'b'
    parts.insert(1, to_move) # ['a', 'b', '01-2020', file.txt']
    joined = '/'.join(parts)
    print(joined)
gturetsky
  • 1
  • 2
0

I would do it following way:

objs = ["a/01-2020/b/file.txt",
"a/01-2020/c/file2.txt",
"a/02-2020/b/file.txt",
"a/02-2020/c/file2.txt"]
for key in objs:
    parts = key.split('/')
    parts[1], parts[2] = parts[2], parts[1]
    print('/'.join(parts))

Output:

a/b/01-2020/file.txt
a/c/01-2020/file2.txt
a/b/02-2020/file.txt
a/c/02-2020/file2.txt

Explanation: I use str.split as delimiter is simply / (no need to import re), then I swap required parts and join using /. Note that due to how = works in python it is possible to swap such way, not only list elements, for example:

a = 10
b = 20
a,b = b,a
print(a)  # 20
print(b)  # 10
Daweo
  • 31,313
  • 3
  • 12
  • 25
0

Using str.split, str.join for string split and concate, and map for sequence iteration.

lst = [
    'a/01-2020/b/file.txt',
    'a/01-2020/c/file2.txt',
    'a/02-2020/b/file.txt',
    'a/02-2020/c/file2.txt',
]

def convert(sequence):
    """
    return sorted(list(map(lambda item:'/'.join(item), [[a, c, b, d]
        for a, b, c, d in map(lambda item:item.split('/'), sequence)])))
    """
    return sorted(
        list(
            map(
                lambda item:'/'.join(item),
                [
                    [a, c, b, d] for a, b, c, d in map(
                        lambda item:item.split('/'),
                        sequence
                    )
                ]
            )
        )
    )

for item in convert(lst):
    print(item)
a/b/01-2020/file.txt
a/b/02-2020/file.txt
a/c/01-2020/file2.txt
a/c/02-2020/file2.txt
Jason Yang
  • 11,284
  • 2
  • 9
  • 23