0

I have an array like this

array([['Weather1', 428, '74827'],
       ['weather1', 429, '74828'],
       ['weather1', 409, '74808'],
       ['weather2', 11553, '76568'],
       ['weather2', 11573, '76574'],

I want to return only the [2] values into a new array group by the values in [0]

Final outcome:

array([['74827', '74828', '74808'],['76568', '76574']]

Any ideas?

xavi
  • 80
  • 1
  • 12
  • My question has a huge overlap with these question https://stackoverflow.com/questions/38013778/is-there-any-numpy-group-by-function/43094244 – xavi Dec 12 '21 at 19:21

1 Answers1

2

Yes, you can do this:

array = [
        ['Weather1', 428, '74827'],
        ['weather1', 429, '74828'],
        ['weather1', 409, '74808'],
        ['weather2', 11553, '76568'],
        ['weather2', 11573, '76574']
]

read_data = [] # stores Weather1, Weather2 etc. as we read that
final_array = [] # stores final arrays

# stores data for weather1, then clears it out and
# then stores data for weather2, and so on...
sub_array = [] 

# read each item of array
for x in array:

    # e.g. for first row, is Weather1 already read?
    # No, it's not read
    if x[0].lower() not in read_data:

        # when you reach weather 2 and hit this statement,
        # sub_array will have data from weather1. So, if you find
        # sub_array with data, it is time to add it to the final_array
        # and start fresh with the sub_array
        if len(sub_array) > 0:
            final_array.append(sub_array)
            sub_array = [x[2]]
        # if sub_array is empty, just add data to it
        else:
            sub_array.append(x[2])
        
        # make sure that read_data contains the item you read
        read_data.append(x[0].lower())

    # if weather1 has been read already, just add item to sub_array
    else:
        sub_array.append(x[2])

# After you are done reading all the lines, sub_array may have data in it
# if so, add to the final alrray
if len(sub_array) > 0:
    final_array.append(sub_array)

print(final_array)

Result: [['74827', '74828', '74808'], ['76568', '76574']]

Assumption: Your data is sequential. That means, weather1 data goes on for a few lines and then weather2 (or something not weather1) goes on for a few lines and so on.

zedfoxus
  • 35,121
  • 5
  • 64
  • 63
  • 1
    Contribution: In case the data was not sequential, you could also sort your data on "weather...", then you will have all weather1, weather2, etc together so that you can define a separation of groups by the index in which the weather has changed. – Richard Valenz Dec 12 '21 at 19:12
  • Why i get back IndexError: index 2 is out of bounds for axis 0 with size 2 – xavi Dec 12 '21 at 19:13
  • @xavi how does you data look like? Does any of your array not have the last string like `'74827'`? – zedfoxus Dec 12 '21 at 19:16
  • 1
    just a kernel restart solves the problem. Thanks – xavi Dec 12 '21 at 19:20
  • @RichardValenz you said it so well. If x[0] is not sequential, just sort the data. Thank you! – zedfoxus Dec 12 '21 at 19:23