-1

I have a PDF file and about 130 .txt files.

The PDF file is useless and needs to be skipped over. Each .txt file contains name data, and each .txt file represents a year ranging from 1880-2010.

All of the .txt files have the same format: Name, Sex, Count of people that had that name in that specific year. Below is an example of one of the .txt files:

Mary,M,8754
Susan,M,5478
Brandy,M,5214
etc...

There are probably thousands of names in each .txt file. My question is basically what the title asks though. I would like to know how I can effiecetnly take each .txt file and put them into sepearte but accessiable DataFrames. I want to be able to quickly search through and extract things like the mean or standard deviation of a specific name.

I've already looked into multiple topics with similar questions/concerns, but none of them have been of any real use to me:

Import multiple csv files into pandas and concatenate into one DataFrame Read multiple *.txt files into Pandas Dataframe with filename as column header

creating pandas data frame from multiple files

Any and all advice is appreciated.

Community
  • 1
  • 1
shadewolf
  • 269
  • 1
  • 2
  • 8

1 Answers1

1
import pandas as pd
from glob import glob

path = 'your_path' # use your path
files = glob(path + '/*.txt')

get_df = lambda f: pd.read_csv(f, header=None, names=['Name', 'Sex', 'Count'])

dodf = {f: get_df(f) for f in files}
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Ahhh, this is a great solution. A couple of questions though...does your "dodf" stand for " data of data frames"? Also, now that each file is in its own respective DataFrame, how can I access them(Say I want to print the first file)? – shadewolf Mar 22 '17 at 20:08
  • @shadewolf `'dict of dataframes'` – piRSquared Mar 22 '17 at 20:09
  • @shadewolf you can use `dodf[files[0])` – piRSquared Mar 22 '17 at 20:10
  • I'm getting a "list index out of range" error on your above example, which is weird since each file has already been assigned to a dataframe. Am i missing something here? – shadewolf Mar 22 '17 at 20:16
  • @shadewolf you may be overwriting the `files` name? That makes no sense to me. – piRSquared Mar 22 '17 at 20:18
  • I don't have another other variable named files so I don't think its being overwritten..hmmm. I'm using exactly what you posted too. – shadewolf Mar 22 '17 at 20:24
  • This is really helpful. Can you please help me in case there are multiple sheets in the excel files? So, maybe put different sheets into different dataframes. Any help would be appreciated. @piRSquared – Manas Jani Sep 13 '17 at 18:07
  • @ManasJani It's better to ask a new question. That way, your specific question can be addressed and the people making the effort to answer it have the opportunity to earn reputation via upvotes and possibly getting the accepted answer. – piRSquared Sep 13 '17 at 18:16