0

I want to call files in a folder, for example the folder contains:

abc-123.txt
def-456.txt
ghi-789.txt

I want to get "abc", "def", "ghi" without renaming the actual .txt file.

My files are like:

1789-Whasington.txt
1793-Whasington.txt
1797-Adams.txt
1801-Jefferson.txt
1805-Jefferson.txt

A want to input the information in a table with pandas to look like this:

President    Year
Washington   1798
Washington   1793
Adams        1797
Jefferson    1801
Jefferson    1805
jalazbe
  • 1,801
  • 3
  • 19
  • 40

4 Answers4

1

A solution for this can be found here, https://stackoverflow.com/a/678242/14191679. Apply the os.path.splittext to the filepath of your file and instead of printing it, assign it to a variable, such as

path =  os.path.splitext("/path/to/some/file.txt")[0]

and use python string manipulation to get the necessary parts of the string you require.

So if you want to get the first 3 letters,

print(path[0:3])

The code above will return the first 3 letters of the string variable, path.

Axieof
  • 46
  • 5
1

Let's break down the problem a bit:

How do we get the names of files in a folder?

We're not going to care about the name, it's fine to get the full thing, just make sure you know how to list the files in a folder.

With a little research on how to list file names in a folder in Python, we get this:

#This line is crucial, as we will use some code from another library to helpus
import os 

# This gives us a list of the names of all the files in the folder provided to the os.listdir() function
my_files = os.listdir("/path/to/folder")

# Now we can loop through and print each file name. 
for file in my_files:
  print(file)

How do we extract part of the name of the file?

This problem has nothing to with files really, but is rather about how to get a part of a string, so I'd recommend you read through the Python3 string documentation here. In particular, if the part you need from the name always comes before a -, then str.split() should help.

You can then apply the same logic to each file in the loop above.

m.oulmakki
  • 296
  • 1
  • 6
1

Here is a simple one-liner list comprehension solution for the above task:

[file.split(".")[0].split("-")[0] for file in os.listdir("path_to_dir")]

There are three main components in the above solution which you can read more about from the links below:

  1. os.listdir - lists files in a given directory
  2. split - splits the string into one or more parts based on some character in the string
  3. List comprehensions - quick way to construct lists on the fly rather than appending to the string in a loop
Felix Dombek
  • 13,664
  • 17
  • 79
  • 131
Rohit
  • 3,659
  • 3
  • 35
  • 57
0

I hope the below implementation answers your question

import os
import pandas as pd 

president = []
year = []

for files in os.path.listdir("path_to_dir"):

    # Split filename and extension (.txt)
    files = os.path.splitext(files)[0]
    extension = os.path.splitext(files)[1]

    # Now to extract the first 3 letter
    file = file.split('-')
    president.append(file[0])
    year.append(file[1])

df = pd.DataFrame(list(zip(president, year)),
               columns =['President', 'Year'])
df.head()
    
Hemant Rakesh
  • 126
  • 1
  • 5