5

I have a collection called project, this collection contain different documents, and every document contain an array of object called data.

enter image description here

I want to be able to filter the data (Excel files) by projectAlias and use pymongon and pandas to structure this file in SQL (Columns and row)

For instance

enter image description here

Manfred Tijerino
  • 377
  • 1
  • 3
  • 12

2 Answers2

4

There's no code here, so I have to make some guesses:

  • I'm going to assume all your data is already in its own array, extracted from whatever form it originally came in. If it needed to be collected from multiple documents, I assume that's already done
  • I assume each object has a key "projectAlias" with a string value
  • I assume any objects without a "projectAlias" key have been dealt with
  • For laziness, I assume you want to order the data lexicographically (e.g. "a" < "b", "A" < "a")

Something like this might be useful:

#Made up function for first and third assumptions
data_array = collect_data(documents)
data_array.sort(key=lambda obj: obj["projectAlias"])

#Or, to create a new array with sorted data
sorted_data = sorted(data_array, key=lambda obj: obj["projectAlias"])

The key arg for python's built in sort function takes some sort of other function and runs it on each element of the array before sorting the results of that array. Then because python is helpful, it has predefined comparisons of strings for sorting which puts capital letters first then lowercase for the English alphabet. That changes when you get into accents, umlauts, and other variations. I have no insight there.

If your data needs some other sorting, you would want to define a different lambda function for key that results in outputs that fit your desired sort more. Another one could be by the length of the value:

#Made up function for first and third assumptions
data_array = collect_data(documents)
data_array.sort(key=lambda obj: len(obj["projectAlias"]))

#Or, to create a new array with sorted data
sorted_data = sorted(data_array, key=lambda obj: len(obj["projectAlias"]))

If you want more info, the "Key Function" section here in python.org's wiki might be useful

ig-
  • 105
  • 7
0

Since this question seems very generic I'm going to suggest using aggregation pipelines method

{ $filter: { input: <array>, as: <string>, cond: <expression> } }

More details can be found at https://docs.mongodb.com/manual/reference/operator/aggregation/filter/

Prashanth M
  • 326
  • 1
  • 7