I used xargs for parallel processing. This thread follows what I had. Parallel processing with xargs in bash But the parallel processing increased CPU load. I was running the script for 57 openstack tenants to fetch results from them. This report is running every 2 hours which is causing CPU spike.
To decrease the load, I thought of adding the random sleep time...something like below but it didn't do much help. I am not sure if I can use NICE
to set a priority because the results I am fetching are from openstack server. If it's possible, please let me know.
I could remove the parallel processing but this will take 6 hours for me to get all the reports from tenants. So that is not an option.
If there is any other way to optimize this...any ideas or suggestions would be great.
source ../creds/base
printf '%s\n' A B C D E F G H I J K L M N O P |
xargs -n 3 -P 8 bash -c 'for tenant; do
source ../creds/"$tenant"
python ../tools/openstack_resource_list.py "$tenant"> ./reports/openstack_reports/"$tenant".html
sleep $[ ( $RANDOM % 10 ) + 1 ]s
done' _
Python file
with open('../floating_list/testReports/'+tenant_file+'.csv', 'wb') as myfile:
fields = ['Name', 'Volume', 'Flavor', 'Image', 'Floating IP']
writer = csv.writer(myfile)
writer.writerow(fields)
try:
for i in servers:
import io, json
with open('../floating_list/testReports/'+tenant_file+'.json', 'w') as e:
e.write(check_output(['openstack', 'server', 'show', i.name, '-f', 'json']))
with open('../floating_list/testReports/'+tenant_file+'.json', 'r') as a:
data = json.load(a)
name.append(i.name)
volume = data.get('os-extended-volumes:volumes_attached', None)
if volume:
vol = [d.get('id', {}) for d in volume if volume]
vol_name_perm = []
for i in vol:
try:
vol_name1 = (check_output(['openstack', 'volume', 'show', i, '-c', 'name', '-f', 'value'])).rstrip()
vol_name_perm.append(vol_name1)
except:
vol_name_perm.append('Error')
vol_join = ','.join(vol_name_perm)
vol_name.append(vol_join)
else:
vol_name.append('None')
...
zipped = [(name), (vol_name),(flavor),(image),(addr)]
result = zip(*zipped)
for i in result:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(i)
except (CalledProcessError, IndexError) as e:
print (e)
print("except", i.name)
WITH GNU PARALLEL
printf '%s\n' A B C D E F | parallel --eta -j 2 --load 40% --noswap 'for tenant; do
source ../creds/"$tenant"
python ../tools/openstack_resource_list.py "$tenant"> ./reports/openstack_reports/"$tenant".html
done'
I get syntax error near unexpected token `A'
WORKAROUND
I am able to manage the load with xargs -n 1 -p 3 for now. This gives me reports within 2 hours. I still want to explore my options with GNU Parallel as suggested by Ole Tange