0

I have a data in geodataframe as shown in the image. enter image description here It contains a column by name neighbourhood_list which contains the list of all the neighbourhood codes of a route. what i want is to create a nested list in which the end element of first pair should be the start element of next because I want to generate a OD directed network (for generating edges) and order also matters here.

to make it bit clear, here is some code.

Here is lets say one record from the dataframe on which i tried some bodge way to get the desired result

list= [15,30,9,7,8]
new_list=[]
for i in range(len(list)-1):
    new_list.append(list[i])
    new_list.append(list[i+1])

so the above code gives the combined list which i then broke into the pairs which i needed

chunks = [new_list[x:x+2] for x in range(0, len(new_list), 2)]
chunks

Actual data is [15,30,9,7,8] and desired output is [[15, 30], [30, 9], [9, 7], [7, 8]]

I just figured out the above code from the answer here Split a python list into other "sublists" i.e smaller lists

However now the real issue is how to apply it in pandas

so far i am trying to tweak around something mentioned here https://chrisalbon.com/python/data_wrangling/pandas_list_comprehension/

here is some incomplete code, i am not sure if it is correct but i thought if somehow i could get the len of list items from each row of the neighbourhood_list column then maybe i could accomplish

for row in df['neighbourhood_list']:
    for i in range ??HOW TO GET range(len) of each row??
    new.append(row[i])
    new.append(row[i+1])

note: as a layman i dont know how the nested looping or lambda functions work or if there is any available pandas functions to perform this task. another thing i think is of something like this also mentioned on stackoverflow, but still how to get length of list of each row, even if i try to create a function first and then apply it to my column.

df[["YourColumns"]].apply(someFunction)

apologies ahead if the question need more clarification (i can give more details of the problem if needed)

Thanks so much.

  • Actually, we need *fewer* details and more focus. Please provide the expected [MRE - Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). Show where the intermediate results deviate from the ones you expect. We should be able to paste a single block of your code into file, run it, and reproduce your problem. This also lets us test any suggestions in your context. – Prune Mar 15 '21 at 00:17
  • Part of the problem seems to be that you are trying to implement a system more complex than your current programming skills can support. Please work through appropriate tutorial to learn PANDAS vectorized (full-column) operations, list comprehensions, and functions as arguments. These will give you the vocabulary and skills needed to narrow down what you need for this problem. – Prune Mar 15 '21 at 00:18
  • i am sorry, i should have been more concise and too the point and indeed my programming skills are way too low. – SpatialAnalyst Mar 15 '21 at 00:39
  • Remember, we've all been you at some point. Even Ada Byron Lovelace had to learn programming, even if she had to invent it first. Take a breath, focus on one skill at a time. – Prune Mar 15 '21 at 00:44

2 Answers2

0

My best guess is that you are trying to create a column containing a list of ordered pairs from a column of lists. If that is the case, something like this should work:

Edit

From what you described, your 'neighbourhood_list' column is not a list yet, but is a string. Add this line to turn the column items to lists, then run the pairs apply.

df['neighbourhood_list']=df['neighbourhood_list'].apply(lambda row: row.split(','))
df['pairs'] = df['neighbourhood_list'].apply(lambda row: [[row[i],row[i+1]] for i in range(len(row)-1)])

If I have misunderstood, please let me know and I'll try and adjust accordingly.

Boskosnitch
  • 774
  • 3
  • 8
  • yes i am trying to break the list into pairs for example [15,30,9,7,8] to [[15, 30], [30, 9], [9, 7], [7, 8]] but the code above is breaking up all the elements of the list, here is a result from the above code [['1', '5'], ['5', ','], [',', '3'], ['3', '0'], ['0', ','], [',', '9'], ['9', ','], [',', '7'], ['7', ','], [',', '8']] – SpatialAnalyst Mar 15 '21 at 00:42
  • @SpatialAnalyst I edited the answer to format your column into a list – Boskosnitch Mar 15 '21 at 00:55
0

From the description you posted, it seems that all you're trying to do is get that list of graph edges from an ordered list of nodes. First, it helps to use existing methods to reduce your pairing to a simple expression. In this case, I recommend zip:

stops = [15,30,9,7,8]
list(zip(stops, stops[1:]))

Output:

[(15, 30), (30, 9), (9, 7), (7, 8)]

Note that I changed your variable name: using a built-in type as a variable name is a baaaaaad idea. It disables some of your ability to reference that type.


Now, you just need to wrap that in a simple column expression. In any PANDAS tutorial, you will find appropriate instructions on using df["neighourhood_list"] as a series expression.

Prune
  • 76,765
  • 14
  • 60
  • 81
  • it gives me error TypeError: 'list' object is not callable but still would be a heads up to search for the solution in this direction as if i will be able to make it work it would be way more easier. also noted for the future to not use reserved words as variable name – SpatialAnalyst Mar 15 '21 at 00:49
  • That's because you disabled the type `list` with your inappropriate variable name. If you run this in a clean environment, it produces the given output. – Prune Mar 15 '21 at 00:52