0

I am trying to run kmean algorithm using madlib library, tool used aginity tried executing :

SELECT * FROM madlib.kmeans_random('select "MPrice" as "MPrice" from "madlib"."sample_sordetail"','MPrice',2,'madlib.squared_dist_norm2','madlib.avg',10,0.001) 

my table name is sample_sordetail.

I am getting the error :

ERROR: XX000: plpy.Error: kmeans error: Data table does not exist! (plpython.c:4648)

Please suggest how to resolve this issue.

Matthew Verstraete
  • 6,335
  • 22
  • 67
  • 123
vkumar
  • 31
  • 1
  • 10

2 Answers2

0

madlib_kmean Please check the kmean_random parameter definition. rel_source -> TEXT. You need to put table name here. "select ......from ..." is not a table. You should be able to find the tutorial in this madlib website.

Sung Yu-wei
  • 161
  • 8
  • Tried also as per madlib documentaion CREATE TABLE km_result AS SELECT * FROM madlib.kmeans_random('madlib.km_sample', 'ARRAY[MPrice]', 2, 'madlib.squared_dist_norm2', 'madlib.avg', 20, 0.001); madlib.km_sample is table name and MPrice is column. Here I get error CREATE TABLE km_result AS SELECT * FROM madlib.kmean_random('madlib.km_sample, ERROR: 42703: column "mprice" does not exist – vkumar Apr 28 '17 at 16:06
  • mprice column is there in table still I am getting this error. I am using Greenplum database 4.3, Madlib 1.9 version Please suggest – vkumar Apr 28 '17 at 16:12
  • you need to deal with capital letters in postgres systems. also make sure the schema is in your search_path. – Sung Yu-wei Apr 28 '17 at 19:58
  • schema is in my search_path select * from madlib.km_sample gives result – vkumar May 01 '17 at 13:52
0

After struggling a lot I got to know the error is due to the data format. The MPrice column should have type points double precision[]. Reference http://madlib.incubator.apache.org/docs/master/group__grp__kmeans.html

After changing MPrice to double precision I was able to execute kmean algorithm on the dataset.

vkumar
  • 31
  • 1
  • 10