Is there a way to run posqresql queries in a pandas dataframe?

Question

I have pandas dataframe like this :

	created_at	lat	long	hex_ID
0	2020-10-13 15:12:18.682905	28.690628	77.323285	883da1ab0bfffff
1	2020-10-12 22:49:05.886170	28.755408	77.112289	883da18e87fffff
2	2020-10-13 15:24:17.692375	28.690571	77.323335	883da1ab0bfffff
3	2020-10-12 23:21:13.700226	28.589922	77.082738	883da112a1fffff
4	2020-10-13 15:43:58.887592	28.649227	77.339063	883da1a941fffff

and I want to convert it like this

	created_at	hex_id	count
0	2020-10-28 22:00:00	883da11185fffff	4
1	2020-09-09 10:00:00	883da111ebfffff	2
2	2020-12-02 20:00:00	883da10769fffff	2
3	2020-10-16 07:00:00	883da111c3fffff	1
4	2020-12-13 11:00:00	883da11747fffff	4

As of now i am taking the dataframe dumping it into postgres and running the below query and then exporting data and at last importing back to my notebook .

Query :

SELECT created_at('hour', timestamp),count(lat),hex_id FROM public."ML_Data"
group by created_at('hour', timestamp),hex_id

I was wondering if I could directly do it in the notebook file

or you can simply do a pd.DataFrame.groupby instead of dumping it into postgres for aggregations. — ABC, Jan 05 '21 at 08:53
@ABC i tried pd.read_Sql but i dont think above query will run on it and group by is okay but i want the dataframe in a particular format , is there any code for referrence — Shouhaddo Paul, Jan 05 '21 at 09:02

score 0 · Answer 1 · answered Jan 05 '21 at 09:10

0

Usually I use psycopg2 for fetching data from postgres, an example code:

import psycopg2
from psycopg2 import sql

with psycopg2.connect(
        host='your_host',
        database='your_database',
        user='your_username',
        password='your_password') as con: 

    cursor = con.cursor()

    query = sql.SQL('your_query_string')
    cursor.execute(query)
    
    data = cursor.fetchall()
    data = pd.DataFrame(data, columns=col_names) # your data column names

or i think you could pd.read_sql, check out this post.

answered Jan 05 '21 at 09:10

ABC

635
3
10

let me share a bit more insight on my data , I have connected my notebook to database ,and had queried data using pycopg2 . From that i am getting 3 columns 1.created_at 2.lat 3.long From these lat long using python library H3 ,I am generating the hex_id . . So doing these i am getting the first table , and now i want to convert the 1st table to the 2nd table provided , – Shouhaddo Paul Jan 05 '21 at 10:08
then you can use .groupby as @Ferris provided you an example. – ABC Jan 05 '21 at 11:15

score 0 · Accepted Answer · answered Jan 05 '21 at 10:17

0

just use groupy in df.

# 2020-10-13 15:12:18.682905 -> 2020-10-13 15:00:00
df['created_at_n'] = df['created_at'].astype(str).str.split(':').str[0] + ':00:00'
df.groupby(['created_at_n', 'hex_id'])['lat'].count()

answered Jan 05 '21 at 10:17

Ferris

5,325
1
14
23

Is there a way to run posqresql queries in a pandas dataframe?

2 Answers2