-1

I would like to save different positions of a file name in different panda df columns.

For example my file names look like this:

001015io.png
  • position 0-2 in column 'y position' in this case '001'
  • position 3-5 in column 'x position' in this case '015'
  • position 6-7 in column 'status' in this case 'io'

My folder contains about 400 of these picture files. I'm a beginner in programming, so I don't know how I should start to solve this.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
fa1992
  • 1

2 Answers2

1

If the parts of the file names that you need are consistent (same position and length in all files), you can use string slicing to create new columns from the pieces of the file name like this:

import pandas as pd

df = pd.DataFrame({'file_name': ['001015io.png']})

df['y position'] = df['file_name'].str[0:3]
df['x position'] = df['file_name'].str[3:6]
df['status'] = df['file_name'].str[6:8]

This results in the dataframe:

      file_name y position x position status
0  001015io.png        001        015     io

Note that when you slice a string you give a start position and a stop position like [0:3]. The start position is inclusive, but the stop position is not, so [0:3] gives you the substring from 0-2.

Bill the Lizard
  • 398,270
  • 210
  • 566
  • 880
0

You can do this with slicing. A string is basically a list of character, so you can slice this string into the parts you need. See the example below.

filename = '001015io.png'

x = filename[0:3]
y = filename[3:6]
status = filename[6:8]

print(x, y, status)

output

001 015 io

As for getting the list of files, there's an absurdly complete answer for that here.

I have this function below in my personal library which I reuse whenever I need to generate a list of files.

def get_files_from_path(path: str = ".", ext=None) -> list:
    """Find files in path and return them as a list.
    Gets all files in folders and subfolders

    See the answer on the link below for a ridiculously
    complete answer for this. I tend to use this one.
    note that it also goes into subdirs of the path
    https://stackoverflow.com/a/41447012/9267296
    Args:
        path (str, optional): Which path to start on.
                              Defaults to '.'.
        ext (str/list, optional): Optional file extention.
                                  Defaults to None.

    Returns:
        list: list of full file paths
    """
    result = []
    for subdir, dirs, files in os.walk(path):
        for fname in files:
            filepath = f"{subdir}{os.sep}{fname}"
            if ext == None:
                result.append(filepath)
            elif type(ext) == str and fname.lower().endswith(ext.lower()):
                result.append(filepath)
            elif type(ext) == list:
                for item in ext:
                    if fname.lower().endswith(item.lower()):
                        result.append(filepath)
    return result

There's one thing you need to take into account here, this function will give the full filepath, fe: path/to/file/001015io.png

You can use the code below to get just the filename:

import os
print(os.path.basename('path/to/file/001015io.png'))

ouput

001015io.png

Use what Bill the Lizard said to turn it into a df

Edo Akse
  • 4,051
  • 2
  • 10
  • 21