Is there a better way to compile column data from multiple files than csv.DictReader?

Question

Preface: I'm very new to this work (as in just started last week) so I apologize in advance if my green-ness shows.

I have 3 separate, large, files of data representing specific distances at different time points. Each file is for a third of the total time, and is split into 53 columns, the first is the timestamp, and the other 52 are each a different distance that was measured, named 01A, 01B, 02A, 02B, etc. My ultimate goal is to create a histogram that combines the data for each distance, say 01A, from all three files.

I came up with this which works perfectly for smaller sample data files I made:

import csv 
import matplotlib.pyplot as plt 
Countries = []
with open("python_practice.txt", "r") as csv_file: 
    csv_file.readline()[1:]
    csv_reader = csv.DictReader(csv_file, delimiter='\t') 
    for lines in csv_reader:
        Country = lines['country'] 
        Countries.append(Country)
with open("python_practice1.txt", "r") as csv_file1: 
    csv_reader1 = csv.DictReader(csv_file1, delimiter='\t')
    for lines in csv_reader1: 
        Country1 = lines['country']
        Countries.append(Country1)

data = Countries
plt.hist(data, bins='auto')

But, when I tried to just make it work for a single file of my actual data via:

import csv 
import matplotlib.pyplot as plt 

Distances = []
with open("distances_1.traj", "r") as csv_file: 
    csv_file.readline()[1:]
    csv_reader = csv.DictReader(csv_file, delimiter='\t') 
    for lines in csv_reader:
        Distance = lines['01A'] 
        Distances.append(Distance)

data = Distances
plt.hist(data, bins='auto')

I get a KeyError: '01A'

I'm not sure why DictReader isn't able to 'recognize' the column name 01A, or how to fix this issue. So any and all advice is welcome here.

Please provide the top few lines of you file(s). Without seeing you data, I doubt anyone would be able to figure out where the problem is (besides the fact that clearly csv doesn't read your headers properly with your real files) — Diziet Asahi, May 15 '19 at 20:01
I don't know how large your "large files" are, but I'd suggest by starting to use `pandas` to import the data through its `read_csv()` function (you'll find many examples on this site, e.g. [here](https://stackoverflow.com/questions/33642951/)) which also directly has functions for plotting histograms. — Asmus, May 15 '19 at 20:10
@DizietAsahi I didn't add anything from the files initially because it's 53 columns wide and thus difficult to visualize — Riley, May 16 '19 at 00:11
Either the file is missing the '01A' column or the column name has some leading or trailing whitespace. — snakecharmerb, May 17 '19 at 06:03

Is there a better way to compile column data from multiple files than csv.DictReader?

0 Answers0