How to loop over 2D matrix with condition

Question

I am new to python and trying to figure out, how to loop over a 2x2 matrix.

My starting point is a *.csv file, which contains a huge amount of data (10 columns with 173828 rows). Because I only need the first column (sigma_0 values) and the sixth column (date), I made a matrix called sigma_JD, which only contains these two columns:

    import csv
    import numpy as np
    with open("C:/Users/.../03971822.csv") as input_file:
       reader = csv.reader(input_file)

       array = []
       for row in reader:
       array.append(row)

    matrix = np.asmatrix(array)
    idx_IN_columns = [0, 5]
    sigma_JD = matrix[:, idx_IN_columns]
    print(sigma_JD)
    print("size sigma_JD: ", np.shape(sigma_JD))

    >>> print(sigma_JD)
    [['-12.42' '2451544.576']
     ['-12.92' '2451544.576']
     ['-12.45' '2451544.576']
     ..., 
     ['-11.66' '2454688.389']
     ['-12.61' '2454688.389']
     ['-11.72' '2454688.389']]
    >>> print("size sigma_JD: ", np.shape(sigma_JD))
    size sigma_JD:  (173828, 2)

Now I want to loop over the second column - the date; it is shown in a specific way, it's called "Julian Day", eg the JD values are

            2451544,5 = 01/January/2000 0:00
            2451545,5 = 02/January/2000 0:00
            2451546,5 = 03/January/2000 0:00

The 2451544 tells the day/month/year, the decimal place tells the time.

I want to write a code, in which all the sigma_0 values are considered, which are within one day. So, the loop should work with the specific values, and not via indexing.

It should start with 2451544,5, then take all the sigma_0 values into account which are within that one day (and sums it up), and then go to the next day 2451544,5 and do the same....

I tried s.th. like this, but it doesn't work

    x = 2451544.5
    y = x + 1
    for i in sigma_JD[:, 1]:
        while x < y:
    print(sigma_JD[i, 1])
    break

Then I thought about creating my own function, but didn't come that far:

   def select(x):
   count = 2451544.5
   select = []  
   for i in range(0, len(x[:, 1])):   # loop over Julian Day
      if count < count + 1:
        row = []
        for j in range(0, len(x[:, 0])):   # loop over sigma_0 values
           # take all sigma_0 values and sum it up
        count += 1
   return select

It would be very, very nice if someone could help me. I am working on this for days, and it really cracks me up that I don't know how to get this done.

Thanks a lot.

You could consider using pandas for this tasks. It uses numpy under the hood but it allows you to, read the csv, convert the datetime column to dates and provides simple to use groupby functions. http://pandas.pydata.org/pandas-docs/stable/index.html — Jan Zeiseweis, Jan 25 '17 at 11:52

Jamie Phan · Accepted Answer · 2017-01-26T04:15:34.977

Give this a shot (you may need to download the jdcal package)

import csv
import collections
from jdcal import jd2gcal

with open("test.csv") as input_file:
    reader = csv.reader(input_file)

    jd_sigma_map = collections.defaultdict(int)
    jd_sigma_count = collections.defaultdict(int)

    for row in reader:

        #convert to the normal date format
        year, month, dd, ms = jd2gcal(float(row[5]), 0)

        #use date as key
        date_key = '%s-%s-%s' % (year, month, dd)

        #Sum sigma values for same key (day)
        jd_sigma_map[date_key] += float(row[0])
        jd_sigma_count[date_key] += 1

File for testing (test.csv):

-12.42, 0, 0, 0, 0, 2451544.576
-12.92, 0, 0, 0, 0, 2451544.576
-5.92,  0, 0, 0, 0, 2451545.677
-2.92,  0, 0, 0, 0, 2451545.699
-16.61, 0, 0, 0, 0, 2454688.310
-11.66, 0, 0, 0, 0, 2454688.389
-12.61, 0, 0, 0, 0, 2454688.400

Output:

#For ordered (by date) output
ordered_dict = collections.OrderedDict(sorted(jd_sigma_map.items()))

for k, v in ordered_dict.items():
    average = float(v/jd_sigma_count[k])
    print("Sigma value for day %s = %0.3f \t(over %d days)\tAverage = %0.3f" 
          % (k, v, jd_sigma_count[k], average))

# Sigma value for day 2000-1-1 = -25.340  (over 2 days)   Average = -12.670
# Sigma value for day 2000-1-2 = -8.840   (over 2 days)   Average = -4.420
# Sigma value for day 2008-8-9 = -40.880  (over 3 days)   Average = -13.627

Notes:

Using the dictionary allows us to 'hold' a sigma value for each day. Makes it easier than fiddling around with a 2x2 array.
The output, jd_sigma_map is a dictionary with the key being the day in the format YYYY-MM-DD and the value being the total sigma values for the day. We don't care about format, we just want each key for a day to be unique
As you can tell, I did all the analysis 'on-the-fly' during the reading of the CSV, you can alternatively store this in a list, and do the analysis post-file close.
Python 3 solution (change print() to print ... and items() to iteritems() for Python 2.x)
See this question to sort the output dictionary (added in code)

Thx so far, the code is working perfectly. I saved the output to a new file, as well. Is there a way that I can see how many sigma_0 values within one day are taken into account? Because, actualIy I need the mean of the sigma_0 values for each day (and not only the sum). — user7448207, Jan 25 '17 at 19:41
No worries, glad I could help! I've updated the answer as requested - this is a simple solution to the problem. Unfortunately I was unable to think of a 'neater' solution which would take advantage of the existing data structures (e.g. changing the `_map` dictionary to have tuple values `(value, count)`) — Jamie Phan, Jan 26 '17 at 04:08
Thanks again for your help. I am fiddling on the idea of making a query before the whole dictionary thing starts. For example, I have a seventh column with distance values (from 0 - 20 000), and I first let the user answer the question ["distance = input("Pls type in the distance/ radius: "] and accordingly to that the sigma_0 values will be totalized & averaged. Is this possible? — user7448207, Jan 27 '17 at 08:46
Is it feasiable, to make a key with date & distance? Like [dist_key = '%f', '%s-%s-%s' % (row[6], year, month, dd) ] ? — user7448207, Feb 01 '17 at 19:09

score 0 · Answer 2 · answered Jan 25 '17 at 12:10

import numpy as np

array = [['-12.42', '2451544.576'],
     ['-12.92', '2451544.576'],
     ['-12.45', '2451544.576'],
     ['-11.66', '2454688.389'],
     ['-12.61', '2454688.389'],
     ['-11.72', '2454688.389']]


matrix = np.asmatrix(array)
print matrix
for (i, j), ele in np.ndenumerate(matrix):
    if j == 1: #SECOND COL
         print i, j, ele

How to loop over 2D matrix with condition

2 Answers2