I imported a csv file and currently it is in a dataframe. It has a total of about 28 columns and I only wanted to keep 9 of them. This is what my code looks like
import os, glob
import pandas as pd
#set the directory
os.chdir(r'C:\Documents\test')
#set the type of file
extension = 'csv'
#take all files with the csv extension into an array
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
col_to_keep=[3,5] #is this how I would put the column position?
#combine all files in the list
df = [pd.read_csv(f, delimiter=';', error_bad_lines=False) for f in all_filenames]
print(df[col_to_keep])
my variable col_to_keep, previously I had it as
col_to_keep = ['Name', 'ID', 'Area', 'Length']
however I get an error that reads
line 21, in <module>
print(df[col_to_keep])
TypeError: list indices must be integers or slices, not list
I'm not sure what I'm doing wrong because I tried using the name of the columns and also what I think is the position of the column. The only other reason I can think of is that some of the value in the columns are floats (i.e 123.98)
I plan on taking this information and bringing it into excel, where eventually I will create a loop that goes through all csv files in a specific folder.
How would I be able to only keep the columns that I want?
Thank you in advance.