1

Preface: I'm very new to this work (as in just started last week) so I apologize in advance if my green-ness shows.

I have 3 separate, large, files of data representing specific distances at different time points. Each file is for a third of the total time, and is split into 53 columns, the first is the timestamp, and the other 52 are each a different distance that was measured, named 01A, 01B, 02A, 02B, etc. My ultimate goal is to create a histogram that combines the data for each distance, say 01A, from all three files.

I came up with this which works perfectly for smaller sample data files I made:

import csv 
import matplotlib.pyplot as plt 
Countries = []
with open("python_practice.txt", "r") as csv_file: 
    csv_file.readline()[1:]
    csv_reader = csv.DictReader(csv_file, delimiter='\t') 
    for lines in csv_reader:
        Country = lines['country'] 
        Countries.append(Country)
with open("python_practice1.txt", "r") as csv_file1: 
    csv_reader1 = csv.DictReader(csv_file1, delimiter='\t')
    for lines in csv_reader1: 
        Country1 = lines['country']
        Countries.append(Country1)

data = Countries
plt.hist(data, bins='auto')

But, when I tried to just make it work for a single file of my actual data via:

import csv 
import matplotlib.pyplot as plt 

Distances = []
with open("distances_1.traj", "r") as csv_file: 
    csv_file.readline()[1:]
    csv_reader = csv.DictReader(csv_file, delimiter='\t') 
    for lines in csv_reader:
        Distance = lines['01A'] 
        Distances.append(Distance)

data = Distances
plt.hist(data, bins='auto')

I get a KeyError: '01A'

I'm not sure why DictReader isn't able to 'recognize' the column name 01A, or how to fix this issue. So any and all advice is welcome here.

Riley
  • 11
  • 2
  • Please provide the top few lines of you file(s). Without seeing you data, I doubt anyone would be able to figure out where the problem is (besides the fact that clearly csv doesn't read your headers properly with your real files) – Diziet Asahi May 15 '19 at 20:01
  • I don't know how large your "large files" are, but I'd suggest by starting to use `pandas` to import the data through its `read_csv()` function (you'll find many examples on this site, e.g. [here](https://stackoverflow.com/questions/33642951/)) which also directly has functions for plotting histograms. – Asmus May 15 '19 at 20:10
  • @DizietAsahi I didn't add anything from the files initially because it's 53 columns wide and thus difficult to visualize – Riley May 16 '19 at 00:11
  • Either the file is missing the '01A' column or the column name has some leading or trailing whitespace. – snakecharmerb May 17 '19 at 06:03

0 Answers0