0

So, my data is in the format of CSV files in the OSS bucket of Alibaba Cloud. I am currently executing a Python script, wherein:

  1. I download the file into my local machine.
  2. Do the changes using Python script in my local machine.
  3. Store it in AWS Cloud.

I have to modify this method and schedule a cron job in Alibaba Cloud to automate the running of this script. The Python script will be uploaded into Task Management of Alibaba Cloud.

So the new steps will be:

  1. Read a file from the OSS bucket into Pandas.
  2. Modify it - Merging it with other data, some column changes. - Will be done in pandas.
  3. Store the modified file into AWS RDS.

I am stuck at the first step itself. Error Log:

"No module found" for OSS2 & pandas.

What is the correct way of doing it?

This is a rough draft of my script (on how was able to execute script in my local machine):

import os,re
import oss2 -- **throws an error. No module found.**
import datetime as dt
import pandas as pd -- **throws an error. No module found.**
import tarfile
import mysql.connector
from datetime import datetime
from itertools import islice
dates = (dt.datetime.now()+dt.timedelta(days=-1)).strftime("%Y%m%d")
def download_file(access_key_id,access_key_secret,endpoint,bucket):

    #Authentication
    auth = oss2.Auth(access_key_id, access_key_secret)

    # Bucket name
    bucket = oss2.Bucket(auth, endpoint, bucket)

    # Download the file
    try:
        # List all objects in the fun folder and its subfolders.
        for obj in oss2.ObjectIterator(bucket, prefix=dates+'order'):
            order_file = obj.key
            objectName = order_file.split('/')[1]
            df = pd.read_csv(bucket.get_object(order_file)) # to read into pandas
            # FUNCTION to modify and upload
        print("File downloaded")
    except:
        print("Pls check!!! File not read")
    return objectName
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
priya
  • 73
  • 1
  • 1
  • 9
  • What is task manager service? Can you share a link? – wanghq Apr 28 '21 at 06:51
  • Sorry wont be able to share the link. But its 1 of the options (Task Management not Task Manager - I have edited in the question too) under Dashboard. Other options being: Run the report, project management, Task Management, data source management, Log management, Actuator management, Resource monitoring, JSON format. Even I am new to this, and only have access to Dashboard. – priya Apr 28 '21 at 09:14

1 Answers1

0
import os,re
import oss2 
import datetime as dt
import pandas as pd 
import tarfile
import mysql.connector
from datetime import datetime
from itertools import islice

import io ## include this new library 

dates = (dt.datetime.now()+dt.timedelta(days=-1)).strftime("%Y%m%d")
def download_file(access_key_id,access_key_secret,endpoint,bucket):

    #Authentication
    auth = oss2.Auth(access_key_id, access_key_secret)

    # Bucket name
    bucket = oss2.Bucket(auth, endpoint, bucket)

    # Download the file
    try:
        # List all objects in the fun folder and its subfolders.
        for obj in oss2.ObjectIterator(bucket, prefix=dates+'order'):
            order_file = obj.key
            objectName = order_file.split('/')[1]


            bucket_object = bucket.get_object(order_file).read() ## read the file from OSS 
            img_buf = io.BytesIO(bucket_object)) 

            df = pd.read_csv(img_buf) # to read into pandas
            # FUNCTION to modify and upload
        print("File downloaded")
    except:
        print("Pls check!!! File not read")
    return objectName
Suraj Rao
  • 29,388
  • 11
  • 94
  • 103
  • 1
    While this code may solve the question, [including an explanation](//meta.stackexchange.com/q/114762) of how and why this solves the problem would really help to improve the quality of your post, and probably result in more up-votes. Remember that you are answering the question for readers in the future, not just the person asking now. Please [edit] your answer to add explanations and give an indication of what limitations and assumptions apply. – Suraj Rao Oct 21 '21 at 16:37