1

I am importing data from a keyed table in Kdb+ to a pandas DataFrame using qPython library. If I run a synchronous query

    x=q.sync('select from prod where ID=9 ') 

Then x is of type qpython.qcollection.QKeyedTable. But if I make numpy_temporals=truethe return type is pandas DataFrame.

    from qpython import qconnection
    with qconnection.QConnection(host = 'localhost', port = 5000) as q:
    query = 'select from table where ID=5'
    x=q.sync(query, numpy_temporals = True)
    print x.iloc[0:3,0:3]
    print x.columns.values

x.iloc[0:1,0:1] returns

EMP_ID   PROD_ID   month   total   x 
01        02       jan-17    5.5   6

x.columns.values returns

['month' 'total' 'x']

The data is from a keyed table, the DataFrame is unable to access the primary key fields. The table has 5 fields but the returned data frame shows only 3. I am unable to access the first two columns.

I have looked at the following stackoverflow questions Not able to view all columns in Pandas Data frame, Python pandas, how to widen output display to see more columns? but they do not solve the problem.

Also I want to read the data from the DataFrame into a class Employee, so as to create a feature vector for each employee. I do not want the data to be stored in a DataFrame as certain features will be multi-valued like organization(the employee might work part-time in multiple organizations).

Am I doing it correctly or is there a better way to solve this problem.

Abhinav Choudhury
  • 319
  • 2
  • 3
  • 15

1 Answers1

2

You're looking at a keyed table - the conversion to a pandas DataFrame makes the keys the indexes for the table -

Q process

q)\p 5000
q)t:([a:til 10;b:reverse til 10]c:10?`3;d:10?10i)

Python process

> import pandas as pd
> import numpy as np
> from qpython.qconnection import QConnection as qc
> q = qc('localhost', 5000)
> q.open()
> x = q.sync('select from t', pandas=True)
> x.columns.values
array(['c', 'd'], dtype=object)
> x.index
MultiIndex(levels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
       labels=[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]],
       names=[u'a', u'b'])

If you wish to view all your columns as a standard DataFrame, with no indexes (other than standard i-indexing), amend your query to

> x = q.sync('0!select from t', pandas=True) 

Note the unkeying performed by the 0!.

> x.columns.values
array(['a', 'b', 'c', 'd'], dtype=object)

It is worth reading the qpython documentation, as it does cover this.

Simon Major
  • 291
  • 2
  • 7