0

Lets say I will have two arrays. The first row would specify the timestamp and 2nd row would be data.

timeStamp = ['0001','0002','0003',...,'9999']

data = [6234,2372,1251,...,5172]

What would be the best way to store them? And let's say I would like to sort the data from smallest to bigger number with keeping their timestamp values attached to them?

6 Answers6

2

Multiple ways of doing this. Let's take the following data -

timeStamp = [9,1,2,3,9999]
data = [1245, 6234,2372,1251,5172]

Using base python and zip

The default way of handling data, specifically lists. zip method allows you to quite literally zip two or more lists element-wise, creating a list of tuples. You can then use sorted with a lamda function that sorts the combined lists by specific position of the element.

l = zip(timeStamp, data) #storing 2 arrays by attaching them elementwise
print(sorted(l, key=lambda x: x[0]))
[(1, 6234), (2, 2372), (3, 1251), (9, 1245), (9999, 5172)]

Using numpy and argsort

Numpy allows you to work with multidimensional arrays. For 2 lists, you can simply np.stack them together to create a 2D array.

In order to sort, you can use argsort() on the first column (timestamp) which returns the indexes of the sorted ordered column. Then you can use these indexes to index the original 2D array to get the sorted order for the array by Timestamps.

arr = np.stack([timeStamp, data])
arr[:,arr[0].argsort()]
array([[   1,    2,    3,    9, 9999],
       [6234, 2372, 1251, 1245, 5172]])

Using pandas datafames and sort_values

Finally, best way to work on multiple lists in conjunction is to consider them as columns in a DataFrame. Pandas provides a handy framework to work with column/row arranged data which in this case is very useful as you can also use column names to identify each array/column.

The sort_values allows you to quickly sort the complete data based on the column name.

import pandas as pd

df = pd.DataFrame(zip(timeStamp, data), columns=['timeStamp','data'])
print(df.sort_values('timeStamp'))
   timeStamp  data
1          1  6234
2          2  2372
3          3  1251
0          9  1245
4       9999  5172
Akshay Sehgal
  • 18,741
  • 3
  • 21
  • 51
1

Depends how you want to use it. If you want to go for no additonal library, I would use something like

result = sorted(({"timestamp": ts, "data": data} for ts, data in zip(timeStamp, data)), key=lambda d:d["data"]

That is basically a list of dictionaries sorted by data. I would go for a list of dicts as it is more expressive as compared to a list of tuples.

Simon Hawe
  • 3,968
  • 6
  • 14
  • I can use additional library but I wanted to know what would be the proper way of doing that. So your suggestion is to use dictionary and sort it out by for loop ? am I correct ? – controlsHeaven Jan 09 '22 at 11:43
1

You could use a two-dimensional array. You can create this by using

timestamp_data = [ [timeStamp[i], data[i]] for i in range(len(timeStamp)) ]

Now, you can sort this by using

sorted_timestamp_data = sorted(timestamp_data, key=lambda row: row[1])
Bloeckchen
  • 65
  • 6
  • hmm so you are proposing to create a 2D array and sort it using sort function which I think is using the indexes of each value to keep the first row attached to the second row I suppose ? – controlsHeaven Jan 09 '22 at 11:44
  • no you are creating an array that looks loke this: `[[1, 6234], [2, 2372], [3, 1251], [9999, 5172]]` and then, you are sorting it by the second value of each sub-array – Bloeckchen Jan 09 '22 at 11:47
1

A dictionary will work really well for you. You can zip data and timeStamp and sort by data then cast the tuples to dict (dictionaries preserve insertion order). Then you'll have data-timestamp pairs where data are keys and timestamps are values.

out = dict(sorted(zip(data, timeStamp)))

Output:

{1251: '0003', 2372: '0002', 5172: '9999', 6234: '0001'}

If you want 2 separate lists instead, you can do the following. Instead of casting to dict constructor, unpack to lists:

data[:], timeStamp[:] = zip(*sorted(zip(data,timeStamp)))

Output:

[1251, 2372, 5172, 6234], ['0003', '0002', '9999', '0001']
0

To organize the data in the way you described, you could simple do:

sorted(zip(timeStamp, data), key=lambda x: x[1])

or

from operator import itemgetter

sorted(zip(timeStamp, data), key=itemgetter(1))

To store this object, you could pickle it, and a good description is here. Obviously, there are a lot of options to store it.

nikeros
  • 3,302
  • 2
  • 10
  • 26
0

Well, that's as easy as

records = list(zip(data, timeStamp))

Sorted:

records.sort()

In Python, tuples are compared elementwise from left to right, so there is no need to provide key function in this case. That's it. There is no need to overcomplicate it, as in some comments.

otykhonruk
  • 181
  • 7