0

I've a pgAdmin database which contains millions of rows in geojson format.Using this table I create Tableau dashboard. Since the rows contain data in geojson format I've to query like this:

select jsondata ->> 'id' as id,
jsondata -> 'properties' ->> 'qq',,
jsondata -> 'properties' ->> 'www',
jsondata -> 'properties' ->> 'eeee',
jsondata -> 'properties' ->> 'qwer' ,
jsondata -> 'properties' ->> 'yyy',
split_part(jsondata -> 'properties' -> '@ab:com:abc:wxy'->'tags'->>0,'@',2),
split_part(jsondata -> 'properties' -> '@ab:com:abc:wxy'->'tags'->>1,'@',2),
split_part(jsondata -> 'properties' -> '@ab:com:abc:wxy'->'tags'->>2,'@',2),
split_part(jsondata -> 'properties' -> '@ab:com:abc:wxy'->'tags'->>3,'@',2),
split_part(jsondata -> 'properties' -> '@ab:com:abc:wxy'->'tags'->>4,'@',2),
split_part(jsondata -> 'properties' -> '@ab:com:abc:wxy'->'tags'->>5,'@',2),
split_part(jsondata -> 'properties' -> '@ab:com:abc:wxy'->'tags'->>6,'@',2),
TIMESTAMP 'epoch' + cast(cast(jsondata -> 'properties'->'@ab:com:abc:wxy'->>'createdAt'as bigint)/1000 as int) * INTERVAL '1 second' as create_date,
TIMESTAMP 'epoch' + cast(cast(jsondata -> 'properties'->'@ab:com:abc:wxy'->>'updatedAt'as     bigint)/1000 as int) * INTERVAL '1 second' as Update_date
from schema.table_name

When data was in thousand rows I used to connect this database live with tableaue but now the data has increased and it's going to increase day by day. I want to create tableaue dashboard using ETL process, I am thinking about Talend but I want to do it programmatically using python.

Can anyone suggest me how to do it?

Vishal Upadhyay
  • 781
  • 1
  • 5
  • 19

1 Answers1

0

You can use Pyspark. Official website: http://spark.apache.org/. Spark is designed for processing and analyzing big data and offers APIs in numerous languages. Spark is a good choice for ETL if the data you’re working with is very large, and speed and size in your data operations. There are other libraries available. But you said you want to use large data, Pyspark will do your work.

You can also try mara. It is a lightweight Python ELT tool. It is powerful and easy to learn and use. It uses PostgreSQL as the data processing engine that will further help you make your project as you already are using Postgre. https://github.com/mara

Zubad Ibrahim
  • 371
  • 2
  • 14