So I have around 65,000 jpg images of cars, each filename has information about the car. For example:
Acura_ILX_2013_28_16_110_15_4_70_55_179_39_FWD_5_4_4dr_aWg.jpg
'Displacement', 'Engine Type', 'Width, Max w/o mirrors (in)', 'Height, Overall (in)',
'Length, Overall (in)', 'Gas Mileage', 'Drivetrain', 'Passenger Capacity', 'Passenger Doors',
'Body Style' 'unique identifier'
Because there are different images of the same car, a unique 3 letter identifier is used at the end of each file.
I have created a data frame from the file names using the following code:
car_file = os.listdir(r"dir")
make = []
model = []
year = []
msrp = []
front_wheel_size = []
sae_net_hp = []
displacement = []
engine_type = []
width = []
height = []
length = []
mpg = []
drivetrain = []
passenger_capacity = []
doors = []
body_style = []
for i in car_file:
make.append(i.split("_")[0])
model.append(i.split("_")[1])
year.append(i.split("_")[2])
msrp.append(i.split("_")[3])
front_wheel_size.append(i.split("_")[4])
sae_net_hp.append(i.split("_")[5])
displacement.append(i.split("_")[6])
engine_type.append(i.split("_")[7])
width.append(i.split("_")[8])
height.append(i.split("_")[9])
length.append(i.split("_")[10])
mpg.append(i.split("_")[11])
drivetrain.append(i.split("_")[12])
passenger_capacity.append(i.split("_")[13])
doors.append(i.split("_")[14])
body_style.append(i.split("_")[15])
df = pd.DataFrame([make,model,year,msrp,front_wheel_size,sae_net_hp,displacement,engine_type,width,height,length,mpg,drivetrain,passenger_capacity,doors,body_style]).T
(It is not the cleanest way to do this I presume)
My question is, how I can most efficiently include the jpg image in the dataset maybe as an additional column at the end.