0

I try to create a column with coordinate of the point as a numpy array. I have a data as Easting and Northing. I would like to reduce large numbers simply by shifting it down. I try to test it with Unittest

I try to follow other questions with .apply(lambda) but can work it out what I have wrong. (I work in pandas 0.9 and can't update it). Below is an example code and the function I struggle with is adjustCoordSystem()

import unittest
import pandas as pd
from pandas.util.testing import assert_frame_equal

def exampleDf():
    df = pd.DataFrame({'Easting':{0:11,1:12,2:13,3:14},
                  'Northing':{0:5,1:7,2:9,3:11}})
    return df

def exampWithCoord():
    df = exampleDf()
    df['Sample']=[[0,0,0],[1,2,0],[2,4,0],[3,6,0]]
    return df

class dfProccesedFull():

    def adjustCoordSystem(self, df):
        ''' change coordinate system to get from 0 to max'''
        df['Sample'] = \
        [df['Easting'].apply(lambda x: x - min(df['Easting'])),
         df['Northing'].apply(lambda x: x - min(df['Northing'])),
         df['Northing'].apply(lambda x: 0.0)]

#         [(df['Easting'] - min(df['Easting'])), (df['Northing'] - min(df['Northing'])),\
#          df['Northing'].apply(lambda x: 0.0)]

        return df

class TestDfProccesedDataFull(unittest.TestCase):

    def test_adjustCoordSystem(self):
        df = exampleDf()
        dfModel = exampWithCoord()
        tData =  dfProccesedFull()
        dfTested=tData.adjustCoordSystem(df)
        assert_frame_equal(dfTested, dfModel)

if __name__ == "__main__"
    unittest.main()

I have an error: AssertionError for line: df['Northing'].apply(lambda x: 0.0)]

How should I change my function to have in the column "Sample" a list of arrays but not looping through each row?

The output I am looking for is new dataframe such as:

   Easting  Northing     Sample
0       11         5  [0, 0, 0]
1       12         7  [1, 2, 0]
2       13         9  [2, 4, 0]
3       14        11  [3, 6, 0]

where "Sample" column comes as [x-coordinate from Easting, y-coordinate from Northing, z-coordinate=0]

Community
  • 1
  • 1
tomasz74
  • 16,031
  • 10
  • 37
  • 51

1 Answers1

2

I'm not sure what this bit was meant to mean... you're trying to assigning it to a single column, unless df is length three it'll fail:

df['Sample'] = [df['Easting'].apply(lambda x: x - min(df['Easting'])),
                df['Northing'].apply(lambda x: x - min(df['Northing'])),
                df['Northing'].apply(lambda x: 0.0)]

See for example:

In [21]: df = pd.DataFrame([[1, 2], [3, 4]], columns=['A', 'B'])

In [22]: df['C'] = [df.copy(), df.copy()]  # use copy to avoid max recursion error...

In [23]: df['C'] = [1, 2, 3]
ValueError: Length of values does not match length of index
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • Thanks. I try to get a new column containing arrays with three coordinates [x,y,z], where coordinates comes from some operations made on other columns such as Easting/Northing or Longitude/Latitude. I have thousands of entries from sensors. – tomasz74 Mar 05 '14 at 09:30
  • You can zip these three columns before assigning. However, this isn't usually the best way to store data in pandas, usually it's best to keep this as sep columns (why do you want them in one?) – Andy Hayden Mar 05 '14 at 17:15
  • I do some geometrical operations I thought it will be easier to have data represented as full coordinates or vector components. So instead of storing columns such x-coordinate of sample, y-coordinate etc I thought I will have column "Sample" where I have in one array all three coordinates, column heading with all components of the vector. Do you know some good examples of using pandas with coordinates, geometrical operations, vectors ...? Would you give some of your insight why single columns are better that columns with all three components in one array? – tomasz74 Mar 05 '14 at 19:22
  • When you store them in that way they are stored in objects array rather than contiguous data in memory. Numpy/pandas operations make use of this and are written efficiently (which you can extract using `.values`). Many vector/matrix operations are in numpy, really depends what you're doing. – Andy Hayden Mar 05 '14 at 19:39
  • Yes, I fully agree, Is there any pros and cons to store them in three separated long numpy arrays [x0,x1,x2,x2...],[y0,y1,...],[z0,z1,...] instead of one numpy array of arrays [[x0,y0,z0],[x1,y1,z1],[x2,y2,z2]...]? – tomasz74 Mar 05 '14 at 20:25
  • 1
    Not so much for numpy float arrays (as you can control how they sit in memory (C or F), but definitely for pandas as data is always in columns (good for column-wise aggregations). In pandas they'll be an *object* array of (pointers to) arrays rather than a contigious array of floats, unless you make it three columns... – Andy Hayden Mar 05 '14 at 20:29