5

For instance, if I have the below table, then I want to remove the 3rd row:

Stock   Price
-------------------
GOOG    101
GOOG    102
GOOG    102     <- want to remove this
GOOG    101

Note: even though row 4 is a duplicate of row 1, I don't want to remove it as it is not a consecutive duplicate. That is, it is not a duplicate of the row immediately above.

I would also want to do check for duplicates across multiple fields, not just Price.

thefourtheye
  • 233,700
  • 52
  • 457
  • 497
mchen
  • 9,808
  • 17
  • 72
  • 125

2 Answers2

8
d:([]Stock:4#`GOOG;Price:101 102 102 101)
q)d
Stock Price
-----------
GOOG  101
GOOG  102
GOOG  102
GOOG  101

q)d where not d~'prev d
Stock Price
-----------
GOOG  101
GOOG  102
GOOG  101
jgleeson
  • 955
  • 5
  • 11
  • Nice! How can I further constrain it to check for duplicates only over certain columns? I.e. if a also had a `timestamp` column in the above example, all the timestamps would be different, but I still want to get rid of the same duplications arising in `Price`. Thanks – mchen Apr 11 '14 at 14:55
  • If it's just one or two columns you want to exclude from the duplicate comparison, you can do something like: `d where not t~'prev t:delete Timestamp from d` Otherwise, if it's only one or two columns you want to include in the comparison, you can do: `d where not t~'prev t:select Stock from d` – jgleeson Apr 11 '14 at 15:16
5

You can also use differ

q)t:([]stock:4#`GOOG; price:101 102 102 101)
q)differ t
1101b
q)t where differ t
stock price
-----------
GOOG  101
GOOG  102
GOOG  101

now let's suppose there is a time column, as you indicate in your comment above

q)t:update time:til count i from t
q)t
stock price time
----------------
GOOG  101   0
GOOG  102   1
GOOG  102   2
GOOG  101   3
q)t where differ `stock`price#t
stock price time
----------------
GOOG  101   0
GOOG  102   1
GOOG  101   3

Now going back to the t without a time column, for simplicity. This gives you speed up over the alternative method proposed by @jgleeson (which I think is great, but a speed up is always welcomed so thought I'd share this regardless)

q)\ts do[10000;r:t where differ t]
31 1184j
q)\ts do[10000;r2:t where not t~'prev t]
62 1488j
q)r~r2
1b
JPC
  • 1,891
  • 13
  • 29
  • interestingly enough the underlying k definition of `differ` seems to me to be what jgleeson is doing q)differ ~~': – JPC Apr 24 '14 at 12:37