This loop is currently taking almost 3 hours on my desktop running at 5ghz (OC). How would I go about speeding it up?
df = pd.DataFrame(columns=['clientId', 'url', 'count'])
idx = 0
for row in rows:
df.loc[idx] = pd.Series({'clientId': row.clientId, 'url': row.pagePath, 'count': row.count})
idx += 1
Rows is JSON data stored in (BigQuery) RowIterator.
<google.cloud.bigquery.table.RowIterator object at 0x000001ADD93E7B50>
<class 'google.cloud.bigquery.table.RowIterator'>
JSON data looks like:
Row(('xxxxxxxxxx.xxxxxxxxxx', '/en-us/index.html', 45), {'clientId': 0, 'pagePath': 1, 'count': 2})
Row(('xxxxxxxxxx.xxxxxxxxxx', '/en-us/contact.html', 65), {'clientId': 0, 'pagePath': 1, 'count': 2})
Row(('xxxxxxxxxx.xxxxxxxxxx', '/en-au/index.html', 64), {'clientId': 0, 'pagePath': 1, 'count': 2})
Row(('xxxxxxxxxx.xxxxxxxxxx', '/en-au/products.html', 56), {'clientId': 0, 'pagePath': 1, 'count': 2})
Row(('xxxxxxxxxx.xxxxxxxxxx', '/en-us/employees.html', 54), {'clientId': 0, 'pagePath': 1, 'count': 2})
Row(('xxxxxxxxxx.xxxxxxxxxx', '/en-us/contact/cookies.html', 44), {'clientId': 0, 'pagePath': 1, 'count': 2})
Row(('xxxxxxxxxx.xxxxxxxxxx', '/en-au/careers.html', 91), {'clientId': 0, 'pagePath': 1, 'count': 2})
Row(('xxxxxxxxxx.xxxxxxxxxx', '/en-ca/careers.html', 42), {'clientId': 0, 'pagePath': 1, 'count': 2})
Row(('xxxxxxxxxx.xxxxxxxxxx', '/en-us/contact.html', 44), {'clientId': 0, 'pagePath': 1, 'count': 2})
Row(('xxxxxxxxxx.xxxxxxxxxx', '/', 115), {'clientId': 0, 'pagePath': 1, 'count': 2})
Row(('xxxxxxxxxx.xxxxxxxxxx', '/suppliers', 51), {'clientId': 0, 'pagePath': 1, 'count': 2})
Row(('xxxxxxxxxx.xxxxxxxxxx', '/en-us/search.html', 60), {'clientId': 0, 'pagePath': 1, 'count': 2})
Row(('xxxxxxxxxx.xxxxxxxxxx', '/en-au/careers.html', 50), {'clientId': 0, 'pagePath': 1, 'count': 2})