1

I have a csv file that has data looks similar to this:

Year     Age
2001    58
2006    52
2006    12
2001    50
2012    59
2017    46

So I want to extract these two rows into two different list.

with open('age.csv', 'r') as files:
        next(files) # skip header

        for row in file_path:
            years = row[0]

return years, average_age

But this will only give me 20, 20, 20, something that I didn't want it to show.

So that I can have like:

years = [2001, 2006, 2006, 2001, blabla]

However, for age, I am planning to get the average age for each year. But I don't know how in this case.

colbyjackson
  • 175
  • 2
  • 10
  • You need to split your row at commas first: `row=row.split(',')` – Julien Oct 16 '17 at 00:24
  • 1
    Or use `csv`, since it was designed for handling csv files. – Ignacio Vazquez-Abrams Oct 16 '17 at 00:25
  • In any case, you get only the last line with this code, unless you append `row[0]` and `row[1]` to a `list`. – Unni Oct 16 '17 at 00:27
  • Possible duplicate of [parsing a tab-separated file in Python](https://stackoverflow.com/q/11059390/937153) – Unni Oct 16 '17 at 00:29
  • Have you tried `Pandas`? – Akshay Oct 16 '17 at 01:11
  • I am actually not trying to use Pandas in this case, but rather extract them by total of age and number of each year, just to practice extracting data. Besides, I haven't still mastered pandas so...I think it would be good choice to use dictionary if I don't know pandas. – colbyjackson Oct 16 '17 at 01:19

2 Answers2

1

You have opened the file, and are reading it line by line. When you reference row[0] and row[1], you are referencing the first and second character of each line. In this case, it happens to be 2 and 0 as part of the year.

You need to take it one more step and interpret the file as a CSV file, rather than just a series of lines. Reading the documentation of Python's core CSV library should help. Meanwhile, here's a code snippet from there which may jump start your process:

>>> import csv
>>> with open('eggs.csv', newline='') as csvfile:
...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
...     for row in spamreader:
...         print(', '.join(row))

Further, with your implementation, it looks like you are iterating over the entire file, constantly reading the first and second characters, and finally returning just the last line. That looks like a possible error.

hunteke
  • 3,648
  • 1
  • 7
  • 17
0

You need to split and strip the string.

with open('age.csv', 'r') as f:
        next(f) # skip header

        list_year = []
        list_age = []
        for row in f:
            year, age = (s.strip() for s in row.split(','))
            list_year.append(year)
            list_age.append(age)
        return (list_year, list_age)
LYF
  • 40
  • 3