3

I have this query:

(?<- (hfs-textline data-out :sinkmode :replace)
        [?item1 ?item2]
        ((hfs-textline data-in) ?line)
        (data-line? ?line)
        (filter-out-data (#(vector (s/split % #",")) ?line) :> ?item1 ?item2)
        )

(defn data-line? [^String row]
  (and (not= -1 (.indexOf row ","))
       (not (.endsWith row ","))
       (not (.startsWith row ","))))

(defn filter-out-data [data]
  (<- [?item1 ?item2]
      (data :#> 9 {4 ?item1
                  8 ?item2})))

The query reads CSV file line by line and checks for lines that meet valid data conditions (data-line?) - this part works. Then it is supposed to split the line by commas, and pass the vector to filter-out-data function, which in turn returns two items extracted from that vector. When I execute the query I get the following error: Unable to resolve symbol: ?line in this context.

I have been trying out different ways of passing the result of split (I would like it to be flexible as the split will differ in size). I am just starting with Clojure and Cascalog and I will be grateful if you could point me in the right direction. Thanks!

juan.facorro
  • 9,791
  • 2
  • 33
  • 41
Anna Pawlicka
  • 757
  • 7
  • 22
  • This is a wild guess since I'm not that familiar with Cascalog, but maybe the `?line` symbol is parsed by the logic solver only if it's on the "first level" of the query. Try moving the `s/split` to the `filter-out-data` function so you are left with `(filter-out-data ?line :> ?item1 ?item2)` as the expression for the query. – juan.facorro Jul 07 '13 at 16:24
  • It is passed from one predicate to the next. In a Cascalog query the output var (?line in this example) is constrained by the use of predicates. Their ordering doesn't matter. The problem was as sortega explained, but the solution was moving the expression to the function, so your guess was a good one :) – Anna Pawlicka Jul 07 '13 at 17:44

1 Answers1

4

The function filter-out-data generates a subquery but you are trying to use it as a predicate and that is not going to work.

I recommend you to move all the logic in the expression (#(vector (s/split % #",")) ?line) to a regular function that you can still call fill-out-data.

(defn filter-out-data [data]
  (let [[_ _ _ item1 _ _ _ item2] (s/split % #"," data))]
    [item1 item2]))

(?<- (hfs-textline data-out :sinkmode :replace)
    [?item1 ?item2]
    ((hfs-textline data-in) ?line)
    (data-line? ?line)
    (filter-out-data ?line :> ?item1 ?item2))

However, you can simplify even more the code by using a CSV library like data.csv.

sortega
  • 1,128
  • 9
  • 15