Is there a better manner to use the with open(file) as f: f.read()
mechanism inside a for loop - i.e. a loop comprehension that operates on many files?
I am attempting to place this into a dataframe such that there is a mapping from file to file contents.
Here is what I have - but it seems to be inefficient and not pythonic/readable:
documents = pd.DataFrame(glob.glob('*.txt'), columns = ['files'])
documents['text'] = [np.nan]*len(documents)
for txtfile in documents['files'].tolist():
if txtfile.startswith('GSE'):
with open(txtfile) as f:
documents['text'][documents['files']==txtfile] = f.read()
output:
files text
0 GSE2640_GSM50721.txt | RNA was extracted from lung tissue using a T...
1 GSE7002_GSM159771.txt Array Type : Rat230_2 ; Amount to Core : 15 ; ...
2 GSE1560_GSM26799.txt | C3H denotes C3H / HeJ mice whereas C57 denot...
3 GSE2171_GSM39147.txt | HIV seropositive , samples used to test HIV ...