Optimization of code written using python and BAPI

Question

I have a python code here which goes into SAP using BAPI RFC_READ_TABLE, queries USR02 table and bring back the results. The input is taken from an excel sheet A column and the output is pasted in B column The code is running all fine. However, for 1000 records, it is taking 8 minutes approximately to run. Can you please help in optimizing the code? I am really new at python, managed to write this heavy code but now stuck at the optimization part.

It would be really great if this can run in 1-2 minutes max.

from pyrfc import Connection, ABAPApplicationError, ABAPRuntimeError, LogonError, CommunicationError
from configparser import ConfigParser
from pprint import PrettyPrinter
import openpyxl

ASHOST='***'
CLIENT='***'
SYSNR='***'
USER='***'
PASSWD='***'
conn = Connection(ashost=ASHOST, sysnr=SYSNR, client=CLIENT, user=USER, passwd=PASSWD)


try:

wb = openpyxl.load_workbook('new2.xlsx')
ws = wb['Sheet1']
for i in range(1,len(ws['A'])+1):
    x = ws['A'+ str(i)].value
    options = [{ 'TEXT': "BNAME = '" +x+"'"}]
    fields = [{'FIELDNAME': 'CLASS'},{'FIELDNAME':'USTYP'}]
    pp = PrettyPrinter(indent=4)
    ROWS_AT_A_TIME = 10
    rowskips = 0
    while True:
    
        result = conn.call('RFC_READ_TABLE', \
        QUERY_TABLE = 'USR02', \
        OPTIONS = options, \
        FIELDS = fields, \
        ROWSKIPS = rowskips, ROWCOUNT = ROWS_AT_A_TIME)         
        rowskips += ROWS_AT_A_TIME
        if len(result['DATA']) < ROWS_AT_A_TIME:
                break
    
    data_result = result['DATA']
    length_result = len(data_result)
    for line in range(0,length_result):
        a= data_result[line]["WA"].strip()
        wb = openpyxl.load_workbook('new2.xlsx')
        ws = wb['Sheet1']
        ws['B'+str(i)].value = a
        wb.save('new2.xlsx')

except CommunicationError:
    print("Could not connect to server.")
    raise
except LogonError:
    print("Could not log in. Wrong credentials?")
    raise
except (ABAPApplicationError, ABAPRuntimeError):
    print("An error occurred.")
    raise

EDIT : So here is my updated code. For now, I have decided to output the data on command line only. Output shows where is the time taken.

try:
    output_list = []
    wb = openpyxl.load_workbook('new3.xlsx')
    ws = wb['Sheet1']
    col = ws['A']
    col_lis = [col[x].value for x in range(len(col))]
    length = len(col_lis)
    for i in range(length):
        print("--- %s seconds Start of the loop ---" % (time.time() - start_time))
        x = col_lis[i]  
        options = [{ 'TEXT': "BNAME = '" + x +"'"}]
        fields = [{'FIELDNAME': 'CLASS'},{'FIELDNAME':'USTYP'}]
        ROWS_AT_A_TIME = 10
        rowskips = 0
        while True:
            result = conn.call('RFC_READ_TABLE', QUERY_TABLE = 'USR02', OPTIONS = options, FIELDS = fields, ROWSKIPS = rowskips, ROWCOUNT = ROWS_AT_A_TIME)         
            rowskips += ROWS_AT_A_TIME
            if len(result['DATA']) < ROWS_AT_A_TIME:
                break
        print("--- %s seconds in SAP ---" % (time.time() - start_time))
        data_result = result['DATA']
        length_result = len(data_result)
        for line in range(0,length_result):
            a= data_result[line]["WA"]
            output_list.append(a)
    print(output_list)

A word about `RFC_READ_TABLE`: it's not a BAPI, it's a RFC-enabled function module. It has many documented flaws (bugs) and SAP never corrected them for 20 years (see SAP notes). Instead, either use `BAPI_USER_GET_DETAIL`, or directly access the database system (ODBC, etc.), or create your own ABAP API. — Sandra Rossi, Sep 28 '20 at 07:18
I ran your code in my environment and got 10 minutes for 2000 usernames fetching 5 fields, and that's considering my very low internet speed and VPN overhead. Your runtime for 1000 users/2 fields should be no more than 2-3 minutes. — Suncatcher, Sep 28 '20 at 13:25
@SandraRossi : Thanks Sandra for the input. You are right its RFC function module. i will try to access the database system instead — Anirudh Chauhan, Sep 28 '20 at 14:47
@Suncatcher : Thanks for the input. Although my internet speed is superfast, Yet the report is taking 12 seconds for 20 records and more than 8 minutes for 1000 records. Did you do any modifications? — Anirudh Chauhan, Sep 28 '20 at 14:49
If the performance issue is mainly due to the number of RFC calls, you might call `RFC_READ_TABLE` only once per 100 lines from Excel file for instance, with `OPTIONS = BNAME IN ('USER1','USER2','USER3',...)`? (note that `OPTIONS` has lines of 72 characters, so you must split the string into lines of 72 characters) — Sandra Rossi, Sep 28 '20 at 16:29
@SandraRossi i will definitely try that out!! Thanks for your valuable help Sandra. — Anirudh Chauhan, Sep 28 '20 at 17:57
I would support Sandra that calling RFC only once will be 10 times more efficient than your current loop calling. People [claim](https://answers.sap.com/answers/566873/view.html) that OPTIONS parameter supports up to 50000 lines and more, which means you can pull up to 250 000 users at once depending in the username length — Suncatcher, Sep 29 '20 at 12:26

score 1 · Answer 1 · answered Sep 28 '20 at 15:19

Firstly I put timing mark at different places of code having divided it into functional sections (SAP processing, Excel processing).

Upon analyzing the timings I found that the most runtime is consumed by Excel writing code, consider the intervals:

16:52:37.306272 
16:52:37.405006 moment it was fetched from SAP
16:52:37.552611 moment it was pushed to Excel
16:52:37.558631 
16:52:37.634395 moment it was fetched from SAP
16:52:37.796002 moment it was pushed to Excel
16:52:37.806930
16:52:37.883724 moment it was fetched from SAP
16:52:38.060254 moment it was pushed to Excel
16:52:38.067235 
16:52:38.148098 moment it was fetched from SAP
16:52:38.293669 moment it was pushed to Excel
16:52:38.304640 
16:52:38.374453 moment it was fetched from SAP
16:52:38.535054 moment it was pushed to Excel
16:52:38.542004 
16:52:38.618800 moment it was fetched from SAP
16:52:38.782363 moment it was pushed to Excel
16:52:38.792336 
16:52:38.873119 moment it was fetched from SAP
16:52:39.034687 moment it was pushed to Excel
16:52:39.040712
16:52:39.114517 moment it was fetched from SAP
16:52:39.264716 moment it was pushed to Excel
16:52:39.275649 
16:52:39.346005 moment it was fetched from SAP
16:52:39.523721 moment it was pushed to Excel
16:52:39.530741  
16:52:39.610487 moment it was fetched from SAP
16:52:39.760086 moment it was pushed to Excel
16:52:39.771057   
16:52:39.839873 moment it was fetched from SAP
16:52:40.024574 moment it was pushed to Excel

as you can see the Excel writing part is much as twice as SAP querying part.

What is wrong in your code is that you open/initizalizing the workbook and sheet in each loop iteration, this slows execution a lot and is redundant as you can reuse the wrokbook variables from the top.

Another redundant thing is stripping leading and trailing zeroes, it is quite of redundant as Excel do this automatically for string data.

This variant of code

try:
    wb = openpyxl.load_workbook('new2.xlsx')
    ws = wb['Sheet1']
    print(datetime.now().time())
    for i in range(1,len(ws['A'])+1):
        x = ws['A'+ str(i)].value
        options = [{ 'TEXT': "BNAME = '" + x +"'"}]
        fields = [{'FIELDNAME': 'CLASS'},{'FIELDNAME':'USTYP'}]
        ROWS_AT_A_TIME = 10
        rowskips = 0
        while True:  
            result = conn.call('RFC_READ_TABLE', QUERY_TABLE = 'USR02', OPTIONS = options, FIELDS = fields, ROWSKIPS = rowskips, ROWCOUNT = ROWS_AT_A_TIME)         
            rowskips += ROWS_AT_A_TIME
            if len(result['DATA']) < ROWS_AT_A_TIME:
                    break
        data_result = result['DATA']
        length_result = len(data_result)
        for line in range(0,length_result):
            ws['B'+str(i)].value = data_result[line]["WA"]
        wb.save('new2.xlsx')
    print(datetime.now().time())
except ...

gives me following timestamps of program run:

>>> exec(open('RFC_READ_TABLE.py').read())
18:14:03.003174
18:16:29.014373

2.5 minutes for 1000 user records, which looks a fair price for this kind of processing.

That's something interesting. So, the problem is strip() and repetitive opening and closing of workbook. I removed both the things as per your code and 500 records took 3.3 minutes. LOL. I guess it is because of my internet speed. — Anirudh Chauhan, Sep 28 '20 at 17:51
I have updated my code with the new one. Can you take a look at the command line picture and let me know how can I reduce the amount of time the excel is taking? — Anirudh Chauhan, Sep 28 '20 at 18:33
I ran your new code and it was executed in 111 sec, and what are your results? — Suncatcher, Sep 29 '20 at 11:20
For me, 3.3 minutes for 500 records. My problem was that the speed of the RFC calling was slow. I guess in your case, it is too fast. So, now i am storing whole data in one list and then hitting the table in SAP in one string. Still working on it. Thanks for your guidance, though. Means a lot — Anirudh Chauhan, Sep 29 '20 at 11:51
`For me, 3.3 minutes for 500 records` if it is **without** the writing into Excel (just printing) it's enormous. Check your internet speed, ping, latency. do you use default PyRFC settings? — Suncatcher, Sep 29 '20 at 12:04

score 0 · Answer 2 · answered Sep 27 '20 at 19:51

0

In my opinion, the problem is in the while True loop. I think you need to optimize your query logic (or change it). It is hard without knowing what you are interested in the DB, The other things looking easy and fast.

Something that could help is to try to not open and close the file continuously: try to compute your "B" column and then open and paste all at once in the xlsx file. It could help (but i'm pretty sure that is the query the problem)

P.S. Maybe you can use some timing library (like here) to compute WHERE you spend most of the time.

answered Sep 27 '20 at 19:51

Khaos101

442
5
14

1

Thanks Khaos. Your answer makes sense. I tried the second thing that you mentioned. Took the whole column as input in the list at once and also took the output as a list and then exported to excel at the end. However, time saved was only 2 seconds for 20 records. Also, the timing library thing was super useful. Thanks. I am trying to reduce the time further but failing. – Anirudh Chauhan Sep 28 '20 at 14:51

Optimization of code written using python and BAPI

2 Answers2