0
Year District Geometry TRUE/FALSE
1900 101 POLYGON ((-89.26355 41.32246, -89.26171 41.322... TRUE
1902 101 POLYGON ((-89.26355 41.33246, -89.26171 41.322... FALSE

I have a dataframe with a large number of columns and rows (only a sample above) and I am trying to create a new column with a conditional response, not based on values within the same row (all of the posts I have read so far seem to just refer to conditional column creation based on values in another column within the same row).

I want to compare the Geometry column, which is a GeometryArray datatype, with the same geometry column of the same district two years earlier.

Phrased as a question: Is the geometry of district 101 in 1902 the same as district 101 in 1900? TRUE/FALSE

df['geometry change from last year'] = np.where(df['geometry'].at[df.index[i]]!= climate[geometry].at[df.index[i-2]], 'True', 'False')
halfelf
  • 9,737
  • 13
  • 54
  • 63
SturgeonNW
  • 15
  • 4

1 Answers1

0

Depending on how your rows are actually organized, you could use eq together with a shift.

(partial answer from here)

First create the dummy dataframe:

import pandas as pd

data = {'Year':[1900,1901,1902],
        'District':[101,101,101],
        'Geometry':[
             'POLYGON ((-89.26355 41.32246, -89.26171 41.322))',
             'POLYGON ((-89.26355 41.33246, -89.26171 41.322))',
             'POLYGON ((-89.26255 41.33246, -89.26171 41.322))'],
        }

df = pd.DataFrame(data)
df

The dataframe looks like:

   Year  District                                          Geometry
0  1900       101  POLYGON ((-89.26355 41.32246, -89.26171 41.322))
1  1901       101  POLYGON ((-89.26355 41.33246, -89.26171 41.322))
2  1902       101  POLYGON ((-89.26355 41.33246, -89.26171 41.322))

Then, combining the mentionned functions:

df['changed'] = df['Geometry'].eq(df['Geometry'].shift(2).bfill().astype(bool)
df

outputs:

   Year  District                                          Geometry  changed
0  1900       101  POLYGON ((-89.26355 41.32246, -89.26171 41.322))    False
1  1901       101  POLYGON ((-89.26355 41.33246, -89.26171 41.322))     True
2  1902       101  POLYGON ((-89.26355 41.33246, -89.26171 41.322))     True

Though you would have to take a look at the very first two rows because of the bfill(), needed for the comparison.

Mat.B
  • 336
  • 2
  • 8
  • Thanks, this definitely gets me thinking. I guess it just doesn't seem to solve the kind of index match approach I was hoping for, because the years and districts do not create equal row numbers that can be shifted with a set value (ie 2 in this case). – SturgeonNW Nov 24 '22 at 03:45
  • I found a basic way to sort the rows that sort of solves my previous comment (it will still require lots of manual validation, but when I try the code you provided: I get the following error: NotImplementedError: fillna with a method is not yet supported – SturgeonNW Nov 24 '22 at 03:49
  • it appears the error is coming as there is some sort of issue with comparison of GeometryArray. I think I found a workaround by extracting centerpoint lat and lon variables and I can run two versions of the above code for x and y coordinates, but when I run it all the results are "False". – SturgeonNW Nov 24 '22 at 04:02
  • 1
    I worked it out! Thank you. Following the link to the other post you shared, the approach using .ne instead of .eq worked for me: df['ColumnB'].ne(df['ColumnB'].shift().bfill()).astype(bool) worked for me. – SturgeonNW Nov 24 '22 at 04:12