As a Senior Developer, its my responsibility to maintain and create all data driven Pipelines. I am working on EMR Serverless for running spark jobs for the whole infrastructure of multiple Data Driven Services and using Redshift as a Data Lake. Also i am working on AWS DMS Tasks for maintaining the AWS Managed Data Warehouse using RDS. For Data Preprocessing in AWS, i am using Glue Jobs and Glue Data Brew Jobs on Serverless Spark. For the Scheduling, i am using AWS Event Bridge, Lambda and MWAA.
Creating and running models in the ETL pipelines after the Transformations and keeping the versions for the best fitted models with live data for highly accurate insights is part of my job. Sources include AWS S3 Buckets, AWS DynamoDB and MS SQL etc.
For the MLOPS, I am working on MLRun and BentoML. Also, I have been using FastAPI as a Backend.
I have worked as a Data Engineer Specifically, Creating DAGs in Airflow using ETL logic. For the work i have done so far includes Time Series Forecasting using LSTM, ARIMA and FEDOT. Data Sets are being taken from data dog and loaded onto the InfluxDB.
I have customized the Airflow Opensource before making the DAGs to override the behavior which is need in the product as a Dockerized Data Collector and then built it.
I am also working on AWS to create Data pipelines using S3, Lambda etc. Also i am publishing containers on AWS ECR and ECS.
For AWS SageMaker i have done the anomaly detection for the error logs taken from application logs given by AWS Cloud Watch, i have written Lambda Functions for this.
Also, I have been using Django REST Framework for making APIs for the backend used in some of web based projects.