1

i have few polygons and distance of some points from those polygons. i tried to write in a csv by pandas where distance between each point and polygon will come in separate rows. i got this:

poly total inside outside dist
1000   2     0      2     [16015,5678]
1100   1     0      1     [5267]

I wanted to get like:

poly total inside outside dist
1000   2    0       2     16015
1000   2    0       2     5678
1100   1    0       1     5267

I tried the following after looking at this previous q [How to write nth value of list into csv file

distance =[]
for row in arcpy.da.SearchCursor(outSide, ["SHAPE@XY"]):
            px, py = row[0]
            zipPoint=point(px,py)
            Distance.append(int(zipPoint.calDist(PolyCenter)))
for i in distance:
    df.loc[polygon,"distance"]=distance
    df.loc[zipCode,"Total"]=count
    df.loc[zipCode,"Inside"]=insideNum
    df.loc[zipCode,"Outside"]=outsideNum

But its giving me the same result in csv. any help is appreciated.

Community
  • 1
  • 1
khan
  • 31
  • 4
  • i just another thread (http://stackoverflow.com/questions/27263805/pandas-when-cell-contents-are-lists-create-a-row-for-each-element-in-the-list ). will let you know after trying this. – khan Mar 29 '17 at 05:04

3 Answers3

1

Creating dataframe:

import pandas as pd
import numpy as np

df= pd.DataFrame({
        'poly':[1000,1100],
        'total':[2,1],
        'inside':[0,0],
        'outside':[2,1],
        'dist':[[16015,5678],[5267]]
        })

df = df[['poly','total','inside','outside','dist']]

df
Out[]: 
   poly  total  inside  outside           dist
0  1000      2       0        2  [16015, 5678]
1  1100      1       0        1         [5267]

Processing

 new_df = pd.DataFrame({
         col:np.repeat(df[col].values, df['dist'].str.len())
         for col in df.columns.difference(['dist'])
     }).assign(**{'dist':np.concatenate(df['dist'].values)})[df.columns.tolist()]


new_df
Out[]: 
   poly  total  inside  outside   dist
0  1000      2       0        2  16015
1  1000      2       0        2   5678
2  1100      1       0        1   5267
Sayali Sonawane
  • 12,289
  • 5
  • 46
  • 47
  • Thanks. got error msg ##TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'## – khan Mar 29 '17 at 19:46
  • i also tried as http://stackoverflow.com/questions/27263805/pandas-when-cell-contents-are-lists-create-a-row-for-each-element-in-the-list . but getting the same table as my input table. zip Total / Inside / Outside / distance 0 77379 / 2 / 0 / 2 / [16015, 5678]............. 1 77380 / 1 / 0 / 1 / [5267] – khan Mar 29 '17 at 21:04
  • df=df.astype(np.intp) try running this before processing section. – Sayali Sonawane Mar 29 '17 at 21:48
  • i got error : ValueError: invalid literal for long() with base 10: '[16015, 5678]' – khan Mar 29 '17 at 22:09
1

You can use str.len for get length of lists which are repeated by numpy.repeat with flattening lists and then join original columns:

from  itertools import chain

s = pd.Series(list(chain.from_iterable(df.dist)),
                   index=np.repeat(df.index.values, df.pop('dist').str.len())).rename('dist')
print (s)
0    16015
0     5678
1     5267
Name: dist, dtype: int64

print (df.join(s).reset_index(drop=True))
   poly  total  inside  outside   dist
0  1000      2       0        2  16015
1  1000      2       0        2   5678
2  1100      1       0        1   5267

Another solution with MultiIndex:

names = ['poly','total', 'inside','outside']
df = df.set_index(names)
mux = pd.MultiIndex.from_tuples(np.repeat(df.index.values, df.dist.str.len()), names=names)
df2 = pd.DataFrame({'dist':list(chain.from_iterable(df.dist))}, index=mux).reset_index()
print (df2)
   poly  total  inside  outside   dist
0  1000      2       0        2  16015
1  1000      2       0        2   5678
2  1100      1       0        1   5267
Community
  • 1
  • 1
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks. looks like mine is on 32 bit platform. got error msg ##TypeError: Cannot cast array data from dtype('int64') to dtype('int32') according to the rule 'safe'## how can i store 64 to 32? – khan Mar 29 '17 at 19:56
  • It seems it is some numpy error, unfortunately I dont know how simulate this error and how solve it... :( – jezrael Mar 29 '17 at 20:12
0

i got a solution..

df2=df.fillna(0) s=df2.apply(lambda x:pd.Series(x['Distances']),axis=1).stack().reset_index(level=1, drop=True) s.name="Distance" df3=df2.drop("Distances",axis=1).join(s)

it looked like: enter code here

32811   253  221  32  20
32811  253  221  32  3015
32811   253  221  32  2010

Thanks you for your help..still i would appreciate if you can give me a solution for error: 'ValueError: invalid literal for long() with base 10: '[16015, 5678]''.

khan
  • 31
  • 4