1

I have a pandas data frame that looks something like this

| id | name | latlon | 0 sat -28,14 | -23, 12 | -21, 13...

the latlon column entry contains multiple latitude/longitude entries, seperated with the | symbol, I need to split them into a list as follows: lat = [-28,-23,-21] lon = [14,12,13]

running the following command will create a list of all the values

sat_df["latlon"]= sat_df["latlon"].str.split("|", expand=False)

example:indexnumber [-58.562242560404705,52.82662430990185, -61.300361184039964,64.0645716165538, -62.8683906074927,76.96557954998904, -63.078154849236505,90.49660509514713, -61.95530287454162,103.39930010176977, -59.727998547544765,114.629246065411, -56.63116878989326,124.07501384844198, -52.9408690779807,131.75498199669985, -48.85803704806645,137.9821558270659, -44.56621244973711,143.03546934613863, -40.08092215592037,147.27807367743728, -35.5075351924213,150.86679792543603,]

how can I continue to split the data, so each other entry is assgined to the lat/lon list respectivley, for the entire dataframe. Alternativley, is there some way to create two columns (lat/lon) which both hold a list object with all the values? EDIT:

import pandas as pd

sat_df = pd.DataFrame({'卫星编号': {0: 38858, 1: 5, 2: 16}, 'path': {0: '-2023240,1636954,-1409847|-2120945,1594435,-1311586|-2213791,1547970,-1209918|', 1: '8847,-974294,-168045|69303,-972089,-207786|129332,-963859,-246237|189050,-949637,-283483|', 2: '283880,751564,538726|214030,782804,550729|142133,808810,558964|69271,829348,563411|'}, 'latlon': {0: '-28.566504816706743,-58.42623323318429|-26.424915546197877,-58.03051668423269|-24.24957760771616,-57.709052434729294|-22.049419348341488,-57.45429550739338|-19.82765114196696,-57.258197633964414|-17.58719794818057,-57.113255687570714|-15.33074070109176,-57.01245109909582|-13.060755383916138,-56.949188922655416|-10.779548173615462,-56.91723753411087|-8.48928513939462,-56.910669632641685|-6.192021225701933,-56.92380598464241|-3.8897270110140494,-56.951159278680606|-1.5843114029280712,-56.987381318629815|0.7223533959819478,-57.02721062232328|3.028411197431552,-57.06542107180802|5.331999106238248,-57.09677071391785|7.631224662503422,-57.115951252231326|9.924144733525859,-57.11753523668981|12.20873984934678,-57.09592379302077|14.482890506579363,-57.045292032888945|16.744349099342163,-56.95953284633186|18.99070929829218,-56.83219872719919|', 1: '-9.826016080133869,71.12640824438319|-12.077961267269185,74.17040194928683|-14.251942328865088,77.22102880126546|-16.362232784638383,80.31943171515469|-18.372371674164317,83.43158582640798|-20.311489634835258,86.62273098947678|-22.14461262803909,89.85609377674561|-23.896490600856566,93.19765633031801|-25.53339979617313,96.60696767976263|-27.063070616439813,100.12254137641649|-28.488648081761962,103.78528610926675|-29.778331008010497,107.54645547637602|-30.942622037767002,111.47495996053523|-31.95152016226762,115.51397654947516|-32.80866797590735,119.73211812295206|-33.486858278098815,124.06227007574186|-33.98257678066123,128.57116785317814|-34.27304876808886,133.17990028392123|-34.34804732039687,137.91355482600457|-34.19053759979979,142.79776551711302|-33.788689805715364,147.73758823197466|-33.12248489727676,152.7937677542324|', 2: '34.00069374375586,-130.03583418452314|34.3070000099521,-125.16691893340256|34.37547230320849,-120.37930544344802|34.219644836708575,-115.72548686095767|33.8599777210809,-111.25048787484094|33.307236654159695,-106.89130089454063|32.579218893589676,-102.68672977394559|31.69071108398145,-98.63657044455137|30.663892680279847,-94.76720076317056|29.49498481622457,-91.01231662520239|28.20247456939903,-87.39472628213446|26.796048279088225,-83.90476041381801|25.29620394685256,-80.5572008057606|23.686627724590036,-77.28791855670698|21.984668849769005,-74.1108962902788|20.209508481020038,-71.0367205896831|18.337433788359615,-68.00383542959851|16.385207987194672,-65.02251732177939|14.355346635752394,-62.078279068092414|12.266387624465171,-59.17870114389838|10.087160866120724,-56.262880710180255|7.8348695447113235,-53.336971029542006|'}})

#splits latlon data into a list
sat_df.dropna(inplace=True)
sat_df["latlon"]= sat_df["latlon"].str.split("|", expand=False)

sat_df

#need to write each entries latlon list as two lists (alternating lat and lon)
lat = []
lon = []

#for sat_df["latlon"]:



Nad A.A.
  • 11
  • 2
  • 2
    as you're new have a read of [mcve] and [how to ask a good pandas question](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – Umar.H Apr 18 '20 at 17:16
  • Hi, Here is some example code with a CSV file (in txt format) ` import pandas as pd sat_df = pd.read_csv("sat.txt") sat_df.dropna(inplace=True) sat_df["latlon"]= sat_df["latlon"].str.split("|", expand=False) sat_df ` ##Link to CSV: [link](https://cdn.discordapp.com/attachments/614466806580641906/701120711749664788/sat.txt) – Nad A.A. Apr 18 '20 at 17:23
  • Hi, I tried to update the post but it gives the error: Body is limited to 30000 characters; you entered 45882. – Nad A.A. Apr 18 '20 at 17:36
  • seems like the data entry it too large to add to a comment – Nad A.A. Apr 18 '20 at 17:38
  • 1
    Hi, I made an edit with some sample data. It should be possible to run in python/jupyter – Nad A.A. Apr 18 '20 at 18:06
  • 1
    yes, I didnt autoformat it afterwards – Nad A.A. Apr 18 '20 at 18:39

2 Answers2

0

lets go a step back from your str.strip and make use of explode which was added in pandas 0.25

then merge it back based on the index.

df = sat_df['latlon'].str.split('|').explode().str.split(',',expand=True)

new_df = pd.merge(sat_df.drop('latlon',axis=1),
df,left_index=True,
right_index=True).rename(columns={0 : 'Lat', 1 : 'Lon'})

print(new_df.drop('path',axis=1))
     卫星编号                  Lat                  Lon
0   38858  -28.566504816706743   -58.42623323318429
0   38858  -26.424915546197877   -58.03051668423269
0   38858   -24.24957760771616  -57.709052434729294
0   38858  -22.049419348341488   -57.45429550739338
0   38858   -19.82765114196696  -57.258197633964414
..    ...                  ...                  ...
2      16   14.355346635752394  -62.078279068092414
2      16   12.266387624465171   -59.17870114389838
2      16   10.087160866120724  -56.262880710180255
2      16   7.8348695447113235  -53.336971029542006
2      16                                      None
Umar.H
  • 22,559
  • 7
  • 39
  • 74
0

For this purpose we are using pandas library.

Initially I have created a dataframe as you have mentioned.

Code:

import pandas as pd

latlon = [-58.562242560404705,52.82662430990185, -61.300361184039964,64.0645716165538, -62.8683906074927,76.96557954998904, -63.078154849236505,90.49660509514713, -61.95530287454162,103.39930010176977, -59.727998547544765,114.629246065411, -56.63116878989326,124.07501384844198, -52.9408690779807,131.75498199669985, -48.85803704806645,137.9821558270659, -44.56621244973711,143.03546934613863, -40.08092215592037,147.27807367743728, -35.5075351924213,150.86679792543603,]
# print(latlon)
data = pd.DataFrame({'id':[0],'name':['sat'],'latlon':[latlon]})

print(data)

Output:

   id name                                             latlon
0   0  sat  [-58.562242560404705, 52.82662430990185, -61.3...

Now I've converted the latlon to string in order to iterate because if you try to iterate float value you may get error. Then we are passing the lattitude and longitude values to corresponding columns of the dataframe.

This code will work even if you have more any number of records or rows in your dataframe.

Code:

#splittint latlon and making adding the values to lat and lon columns
lats = []
lons = []
for i in range(len(data)):
    lat_lon = [str(x) for x in (data['latlon'].tolist()[i])]

    lat = []
    lon = []
    for i in range(len(lat_lon)):
        if i%2==0:
            lat.append(float(lat_lon[i]))
        else:
            lon.append(float(lat_lon[i]))
    lats.append(lat)
    lons.append(lon)

data = data.drop('latlon',axis=1)  #dropping latlon column
data.insert(2,'lat',lats)  #adding lat column
data.insert(3,'lon',lons)  #adding lon column
# print(data)
data  #displaying dataframe

Output:

    id  name    lat     lon
0   0   sat     [-58.562242560404705, -61.300361184039964, -62...   [52.82662430990185, 64.0645716165538, 76.96557...

I hope it would be helpful.

Littin Rajan
  • 852
  • 1
  • 10
  • 21