today I have faced a problem that I couldn't solve on my own, despite searching for solutions - it appeared to me, that either my approach is wrong or noone before asked similar question.
I'm playing around with Markov attribution, so I've got columns with strings that look like that:
A > B > B > C > B > A > C > B > A
etc.
...it is created on base of postgresql function 'string_agg'.
What I think would be important for me is assigning a number of for which time each string appears in entire string. To make it clear, at the end of the day, it would look like this:
A1 > B1 > B2 > C1 > B3 > A2 > C2 > B4 > A3
There are three main challenges:
- there are around 100 different types of elements to be counted that may change it time, so it makes it hard to hardcode it,
- the dataset is around 200k rows,
- strings may be up to few hundred characters long
The only thing that came up to my mind is to write some sort of loop, but it feels like it would take up ages until it finishes.
I also thought about solving it on postgresql level, but couldn't find efficient and easy solution to it neither.