2

I am very new to Python so bear with me please. I have a folder with csv files where the first row is data I need to work with. So I need to give them column names so I can call them later. Each csv has the same number of columns. For my practice I'm using three columns.

I understand how to add file names to a single file:

my_file = pd.read_csv('path\the_file.csv', names = ['first','second','third'])

But I need to go to my directory and loop through a large number of csv files. I'm honestly not even sure how to do that (sad I know). I've managed to loop through the file names using os.listdir but that isn't much use to me when I need the data in them. I know what to do once I get those column names.

Using pandas whenever possible is highly preferable. I've looked a lot but can't seem to find anything that actually works. I'd really appreciate the help!

edit: This is part of what I'll be doing but need to do for ALL csv files in the folder.

my_file = pd.read_csv('path\the_file.csv', names=['first','second','third'])
first_col = my_file['first']
second_col = my_file['second']
third_col = my_file['third']
key_codes = []
key_codes.append(second.map(str) + third.map(str))

So, if column 2 has, "123" and column 3 has, "4" then I'm making "1234" I'm doing more than that but for now I just need to figure out how to loop through the files and add the same name/header to them all.

  • I don't think I understand the problem. Please explain clearly what is not working. – Julien Jun 24 '16 at 07:26
  • Hi @wiredflamingo, can you show us your steps, please? Remember, SO is not a code factory nor an outsourcing code writing... Welcome to SO or sort of ... – Andy K Jun 24 '16 at 07:28
  • when you loop all files, what is desired output? List od `DataFrames` ? Or you need concat all dataframes to one? – jezrael Jun 24 '16 at 07:32
  • Sorry, I have a folder with a bunch of CSV files. I need to make a for loop that would add column names/headers to each CSV. I don't need to write over the original file, just need to make it so I can call the columns in my code. For example, if column 1 says, "123" and column 2 says, "4" I'd make it "1234" but I know how to do that, I just need to be able to call the columns. Hope this helps. – WiredFlamingo Jun 24 '16 at 07:34
  • Sorry, but do you need only add column names and write each to new csv? And then each file has different header or each have same header? Why do you need pandas? – jezrael Jun 24 '16 at 07:40
  • Ah sorry I updated my question I hope that helps. I'm learning this all with little instruction so it is quite difficult for me. As for why I need pandas, that's just generally what I've worked with in this past, so it helps my understanding, but I'm open to others just need clear explanations. – WiredFlamingo Jun 24 '16 at 07:48

1 Answers1

0

IIUC you need glob:

#glob can use path with *.txt - see http://stackoverflow.com/a/3215392/2901002
import glob
key_codes = []
for files in glob.glob('files/*.csv'):
    df = pd.read_csv(files, names = ['first','second','third'])
    key_codes.append(df.second.map(str) + df.third.map(str))

Another solution with selecting second and third column by iloc, there is also removed parameter names and add header=None to read_csv:

#glob can use path with *.txt - see http://stackoverflow.com/a/3215392/2901002
import glob
key_codes = []
for files in glob.glob('files/*.csv'):
    df = pd.read_csv(files, header=None)
    key_codes.append(df.iloc[0,1].astype(str) + df.iloc[0,2].astype(str))
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Oh thank you! I didn't see that link before. I had tried something very similar to what you posted actually, but I lost track of all my attempts eventually. – WiredFlamingo Jun 24 '16 at 08:02