3

I have two following lists:

indexList = [5,3,2,2,7,1]
valueList = [1,2,3,4,5,6]

I want to sort the two together, so that the output is:

indexList = [1,2,2,3,5,7]
valueList = [6,3,4,2,1,5]

Then, I want to fill-in the missing indices and their corresponding values as "0":

indexList = [1,2,2,3,4,5,6,7]
valueList = [6,3,4,2,0,1,0,5]

Lastly, I want to remove repeated indices and sum their values:

indexList = [1,2,3,4,5,6,7]
valueList = [6,7,2,0,1,0,5]

Would there be a built-in module to perform such task? Could any one guide me with piece of wisdom?

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
user7288808
  • 117
  • 2
  • 8
  • 2
    Welcome to StackOverflow. Please split this question by first asking about the first step. Also if you have any idea about any of the steps show what you have. In the current shape the question is too broad for containg several subquestions and does not show any of your won reasearch effort. The first step is admittedly hard, but you can still work on the other steps by assuming sorted input. – Yunnosch Aug 23 '18 at 15:04
  • 2
    I am fairly certain there isn't a built-in module for such a task. I would recommend coding a sorting algorithm that sorts `valueList` based on `indexList` sorting – depperm Aug 23 '18 at 15:04
  • Some more pointers like how big is your dataset, are there any space/time constraints are also helpful. – Jay Aug 23 '18 at 15:54
  • @Yunnosch: I understand your point, but my question was "Is there a python module that can handle all these tasks?" So, it won't make sense if I splitted the questions up. Also, I don't want to clog up the question space by adding in explanations of my approach. But again, I understand your concern. After all, I got a great answer and learned a lot from one responder, I must say, thank you all. – user7288808 Aug 23 '18 at 19:40

3 Answers3

3

You can use pandas:

import pandas as pd
indexList = [5,3,2,2,7,1]
valueList = [1,2,3,4,5,6]
s = pd.Series(valueList, index= indexList)
s = s.groupby(s.index).sum().reindex(np.arange(s.index.min(), s.index.max()+1), fill_value=0)
print(s.index.tolist())
print(s.tolist())

Output:

[1, 2, 3, 4, 5, 6, 7]
[6, 7, 2, 0, 1, 0, 5]

Details

  • Create a pandas series using valuesList as the data and indexList as the index of the series.
  • Use groupby with sum to combine like indexes summing the value.
  • Next, reindex the series from the min of the series index to the max of the series index and use fill_value parameter to fill missing indexes with 0 values.
  • Print series index tolist.
  • Print series values tolist.
Scott Boston
  • 147,308
  • 15
  • 139
  • 187
2

For the first question, you could sort the zip of both lists, i.e. sort a list of tuple:

indexList = [5,3,2,2,7,1]
valueList = [1,2,3,4,5,6]

sorted(zip(indexList, valueList))
# [(1, 6), (2, 3), (2, 4), (3, 2), (5, 1), (7, 5)]

Quote from this answer:

Python sorts tuples and lists like these lexicographically; compare the first element, and only if that doesn't differ, compare the second element, etc.

And if you want to pack the values again into two lists:

indexList, valueList = list(zip(*sorted(zip(indexList, valueList))))

print( indexList, valueList )
# (1, 2, 2, 3, 5, 7) (6, 3, 4, 2, 1, 5)
xdze2
  • 3,986
  • 2
  • 12
  • 29
0

(First I would suggest switching the variable names because it seems valueList=[5,3,2,2,7,1] and indexList=[1,2,3,4,5,6].) But, instead of using two lists, maybe start with a dictionary where the key is the index in index list and the value is the value in value list. Something like: d = {5:1,3:2, 2:3,2:4,7:5,1:6} Sort the dictionary keys as per [https://www.saltycrane.com/blog/2007/09/how-to-sort-python-dictionary-by-keys/][1] so that you are sorting the valuelist and the indexlist gets sorted with it. From here perhaps separate the keys (valuelist) into one list, and the values (indexlist) into another list. You can loop through value list to find the missing values, insert them (and in the corresponding index in indexlist insert 0), then remove duplicates from valuelist, and the corresponding index from indexlist. Hope this helps.

Connor Watson
  • 115
  • 1
  • 1
  • 9