-1

I have a directory of 50 txt files. I want to combine the contents of each file into a Python list.

Each file looks like;

line1
line2
line3

I am putting the files / file path into a list with this code. I just need to loop through file_list and append the content of each txt file to a list.

from pathlib import Path


def searching_all_files():
    dirpath = Path(r'C:\num')
    assert dirpath.is_dir()
    file_list = []
    for x in dirpath.iterdir():
        if x.is_file():
            file_list.append(x)
        elif x.is_dir():
            file_list.extend(searching_all_files(x))
    return file_list

But I am unsure best method

Maybe loop something close to this?

NOTE: NOT REAL CODE!!!! JUST A THOUGHT PULLED FROM THE AIR. THE QUESTION ISNT HOW TO FIX THIS. I AM JUST SHOWING THIS AS A THOUGHT. ALL METHODS WELCOME.

file_path = Path(r'.....')
    with open(file_path) as f:
        source_path = f.read().splitlines()
    source_nospaces = [x.strip(' ') for x in source_path]
    return source_nospaces
uncrayon
  • 395
  • 2
  • 11
  • As a slight shortcut, you can use `.readlines()` instead of `.read().splitlines()`. – John Gordon Feb 17 '23 at 23:08
  • are you trying to build a search engine for your files? Are you only looking for text. Using getChunk will load in blocks of data which can be searched. – Golden Lion Feb 17 '23 at 23:08
  • I just need to put the contents of these files into a list to save time. That's it. – uncrayon Feb 17 '23 at 23:11
  • See https://stackoverflow.com/a/45172387 for how you can use `iglob(.., recursive=True)` to get all files returned automagically without having to handle them yourself. Also, .readlines() and `.read().splitlines()` will behave differently; the first will include the newline at the end of each line, while the latter won't. You can use `list.extend` with the returned value from `f.read().splitlines()` to append the content of each file to your main list. – MatsLindh Feb 17 '23 at 23:11
  • "As a slight shortcut, you can use .readlines() instead of .read().splitlines()" A shortcut to what? That snippet was me spitballing. – uncrayon Feb 17 '23 at 23:12
  • @ MatsLindh I'd need to see an example. I don't follow. – uncrayon Feb 17 '23 at 23:13
  • If you're just trying to combine a set of files by concatenation, you can do that from a command line with no programming required. Is that your task? – Tim Roberts Feb 17 '23 at 23:16
  • @Tim Roberts no I don't want to do `copy *.txt newfile.txt` see op. – uncrayon Feb 17 '23 at 23:23
  • If you're on Windows, you can do `copy a.txt+b.txt+c.txt+d.txt out.txt`. On Linux, you can use `cat` to do the same function. Why wouldn't you want the easiest method that solves your problem? – Tim Roberts Feb 17 '23 at 23:41
  • Tim I literally said I didn't want to do that. See my code snippet for a more succinct way btw. I solved the problem in OP... – uncrayon Feb 28 '23 at 19:18

1 Answers1

3

You could make use of pathlib.rglob in order to search for all files in a directory recursively and readlines() to append the contents to list:

from pathlib import Path
files = Path('/tmp/text').rglob('*.txt')
res = []
for file in files:
    res += open(file).readlines()
print(res)

Out:

['file_content2\n', 'file_content3\n', 'file_content1\n']
Maurice Meyer
  • 17,279
  • 4
  • 30
  • 47
  • You forgot the `r`. This code will results in the path having too many \. For anyone reading, Pathlib also gives a horribly misleading error too. It makes it sound like something else is happening. Always use `Path(r'path...')` – uncrayon Feb 28 '23 at 19:14