0

This is my code below, what I'm trying to do is pull out data from cells in the dataframe and eventually store them to a new dataframe called result.

import pandas as pd

df3 = pd.read_csv('H:/Martin Dow/COSOME.csv', header = 3)

result = pd.DataFrame({'BRICK': [],
          'OUTLET_FLAG': [],
          'OUTLET_ID' : [],
          'MONTH' : [],
          'YEAR'  : [],
          'VALUE' : [],
          'AMOUNT_TYPE': [],
          'MARKET_TYPE' : [],
          'BRAND' : [],
          'SUB_MARKET_TYPE' : []})

result = pd.DataFrame(result)

for i in range(0,len(df3.index)):
  month = 1
  year = 2020
  print(i)

  if(df3.iloc[i][1] == 'D'):
    for j in range(0,24):
      x = df3.iloc[i][2]
      result.loc[len(result.index)] = [df3.iloc[i][0], df3.iloc[i][1], x[x.find('[')+1:x.find(']')], month,year, df3.iloc[i][27+j],'Value', 'R05C0', 'COSOME-E', '']
      month = month+1
      if(month == 13):
        month = 1
        year = year +1

  if(df3.iloc[i][1] == 'H' or df3.iloc[i][1] == 'R'):
    for j in range(0,48):
      x = df3.iloc[i][2]
      if(j<24):
        result.loc[len(result.index)] = [df3.iloc[i][0], df3.iloc[i][1], x[x.find('[')+1:x.find(']')], month,year, df3.iloc[i][3+j],'Value', 'COSOME-E', 'COSOME-E', '']
      else:
        result.loc[len(result.index)] = [df3.iloc[i][0], df3.iloc[i][1], x[x.find('[')+1:x.find(']')], month,year, df3.iloc[i][3+j],'Value', 'R05C0', 'COSOME-E', '']
      month = month+1
      if(month == 13):
        month = 1
        year = year +1
      if(year == 2022):
        year = 2020
        
f = 'cosomexxx.csv'

The CSV is 80000 rows and after 30minutes I am at row#6000. I searched on the internet regarding other functions like iterrows() but that gave me a similar result as iloc. The assignment wants us to create 24 rows for rows with flag 'D' and 48 for rows which do not have flag 'D'. As D occurs 1 in 10 times, it is safe to say that the average is around 45.6 per row. Meaning the final CSV will be ~80000*45.6 = 3648000 rows (Hope my math is right)

Now I can wait for it to finish but it's absurd that it is literally doing 1 row per second.

This is my first time using pandas so any advice?

I searched the internet around and they recommend vectorization or df3.apply() however I can not find a way to implement it on the lines where result.loc is being called. :(

Sample DataFrame

Brick,Flag,Outlet,Jan-2020,Feb-2020,Mar-2020,Apr-2020,May-2020,Jun-2020,Jul-2020,Aug-2020,Sep-2020,Oct-2020,Nov-2020,Dec-2020,Jan-2021,Feb-2021,Mar-2021,Apr-2021,May-2021,Jun-2021,Jul-2021,Aug-2021,Sep-2021,Oct-2021,Nov-2021,Dec-2021,Jan-2020,Feb-2020,Mar-2020,Apr-2020,May-2020,Jun-2020,Jul-2020,Aug-2020,Sep-2020,Oct-2020,Nov-2020,Dec-2020,Jan-2021,Feb-2021,Mar-2021,Apr-2021,May-2021,Jun-2021,Jul-2021,Aug-2021,Sep-2021,Oct-2021,Nov-2021,Dec-2021,Jan-2020,Feb-2020,Mar-2020,Apr-2020,May-2020,Jun-2020,Jul-2020,Aug-2020,Sep-2020,Oct-2020,Nov-2020,Dec-2020,Jan-2021,Feb-2021,Mar-2021,Apr-2021,May-2021,Jun-2021,Jul-2021,Aug-2021,Sep-2021,Oct-2021,Nov-2021,Dec-2021,Jan-2020,Feb-2020,Mar-2020,Apr-2020,May-2020,Jun-2020,Jul-2020,Aug-2020,Sep-2020,Oct-2020,Nov-2020,Dec-2020,Jan-2021,Feb-2021,Mar-2021,Apr-2021,May-2021,Jun-2021,Jul-2021,Aug-2021,Sep-2021,Oct-2021,Nov-2021,Dec-2021,,
1002 - A.H.ISPHANI ROAD,D,DOCTOR - A.H.ISPHANI ROAD KARACHI [121402],,,,,,,,,,,,,,,,,,,,,,,,,30663.81,10604.34,27068.92,9295.54,22686.6,3615.61,10205.99,37882.09,8278.13,11627.87,21182.06,19180.2,17022.59,112318.2,32378.88,23987.27,16900.71,6698.92,11058.7,61924.81,25154.53,25081.11,39896.5,39291.61,,,,,,,,,,,,,,,,,,,,,,,,,476,213,472,174,393,62,192,533,157,236,243,271,244,1906,583,414,339,116,117,587,228,294,461,435,624004.99,9146
1002 - A.H.ISPHANI ROAD,H,AL-AZIZ MEDICAL CENTRE [508551],,,,,,,,,,,,,,,,,,,,,,,,,,246.48,151.08,,517.83,297.9,,,184.86,184.86,554.58,462.15,,,154.05,135.6,169.5,67.8,67.8,67.8,,,135.6,101.7,,,,,,,,,,,,,,,,,,,,,,,,,,8,4,,15,8,,,6,6,18,15,,,5,4,5,2,2,2,,,4,3,3499.59,107
1002 - A.H.ISPHANI ROAD,H,AL-SHIFA MEDICAL CENTRE [505326],,,,,,,,,,,,,,,,,,,,,,,,,58.65,,175.95,293.25,762.45,,117.3,234.6,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,,3,5,13,,2,4,,,,,,,,,,,,,,,,,1642.2,28
1002 - A.H.ISPHANI ROAD,H,FAMILY CARE HOSPITAL [511355],,,,,,,,,,696.8,1072,,964.8,160.8,428.8,,482.4,,1072,1500.8,321.6,,804,,760.75,1173,762.45,390.15,1524.9,234.6,1466.25,1231.65,2091.85,6282.37,10572.75,13619.97,16317.15,22273.77,13716.06,19317.75,12398.47,1868.35,32702.35,16389.52,15058.81,15725.68,28620.2,20897.65,,,,,,,,,,13,20,,18,3,8,,9,,20,28,6,,15,,11,20,13,3,26,4,25,21,35,103,153,215,316,390,241,364,245,37,582,253,255,276,502,367,262900.45,4597
1002 - A.H.ISPHANI ROAD,H,RAJPUT GENERAL HOSPITAL [502299],,,,,,,,,,,,,,,,,,,,,,,,,4530.44,5886.9,2407.86,502.98,2553.1,3551.54,1866.19,1732.83,3710.47,4740.87,4979.08,5205.59,6917.53,6805.39,4047.38,4585.69,2052.94,2586.75,5723.04,4919.04,4607.84,7337.57,7266.11,4576.37,,,,,,,,,,,,,,,,,,,,,,,,,79,99,44,10,42,63,29,31,60,75,75,69,87,78,68,72,34,42,96,81,77,118,116,78,103093.5,1623

Output DataFrame

,BRICK,OUTLET_FLAG,OUTLET_ID,MONTH,YEAR,VALUE,AMOUNT_TYPE,MARKET_TYPE,BRAND,SUB_MARKET_TYPE
0,1002 - A.H.ISPHANI ROAD,D,121402,1,2020,30663.81,Value,R05C0,COSOME-E,
1,1002 - A.H.ISPHANI ROAD,D,121402,2,2020,10604.34,Value,R05C0,COSOME-E,
2,1002 - A.H.ISPHANI ROAD,D,121402,3,2020,27068.92,Value,R05C0,COSOME-E,
3,1002 - A.H.ISPHANI ROAD,D,121402,4,2020,9295.54,Value,R05C0,COSOME-E,
4,1002 - A.H.ISPHANI ROAD,D,121402,5,2020,22686.6,Value,R05C0,COSOME-E,
5,1002 - A.H.ISPHANI ROAD,D,121402,6,2020,3615.61,Value,R05C0,COSOME-E,
Sixteen '
  • 1
  • 1

0 Answers0