2

I am trying to learn python 2.7 by converting code I wrote in VB to python. I have column names and I am trying to create a empty dataframe or list then add rows by iterating (see below). I do not know the total number of rows I will need to add in advance. I can create a dataframe with the column names but can't figure out how to add the data. I have looked at several questions like mine but the row/columns of data are unknown in advance.

snippet of code:

cnames=['Security','Time','Vol_21D','Vol2_21D','MaxAPV_21D','MinAPV_21D' ]
df_Calcs = pd.DataFrame(index=range(10), columns=cnames)

this creates the empty df (df_Calcs)...then the code below is where I get the data to fill the rows...I use n as a counter for the new row # to insert (there are 20 other columns that I add to the row), but the below should explain what I am trying to do.

i = 0
n = 0
while True:
        df_Calcs.Security[n] = i + 1
        df_Calcs.Time[n] = '09:30:00'
        df_Calcs.Vol_21D[n] = i + 2
        df_Calcs.Vol2_21D[n] = i + 3
        df_Calcs.MaxAPV_21D[n] = i + 4
        df_Calcs.MinAPV_21D[n] = i + 5
        i = i +1
        n = n +1
        if i > 4:
           break

print df_Calcs If I should use a list or array instead please let me know, I am trying to do this in the fastest most efficient way. This data will then be sent to a MySQL db table.

Result...

  Security      Time Vol_21D Vol2_21D MaxAPV_21D MinAPV_21D
0        1  09:30:00       2        3          4          5
1        2  09:30:00       3        4          5          6
2        3  09:30:00       4        5          6          7
3        4  09:30:00       5        6          7          8
4        5  09:30:00       6        7          8          9
5      NaN       NaN     NaN      NaN        NaN        NaN
6      NaN       NaN     NaN      NaN        NaN        NaN
7      NaN       NaN     NaN      NaN        NaN        NaN
8      NaN       NaN     NaN      NaN        NaN        NaN
9      NaN       NaN     NaN      NaN        NaN        NaN
Hedgebox
  • 159
  • 2
  • 11
  • One fundamental thing of pandas is it is built on top of numpy and so it runs matrix or blockwise operations as opposed to looping. You appear to be running aggregations which can be done in SQL query or in pandas without any looping. Please post current data and desired results. – Parfait Sep 23 '18 at 21:59
  • I am doing some aggregations and several other things. I am able to accomplish creating the dataframe by changing the 1st snippet, which I have edited above. Then deleting any blank rows in the dataframe. I would much prefer not having to "over" estimate the number of rows. – Hedgebox Sep 24 '18 at 00:16
  • Once again, please post sample data for [MCVE] AND desired results. See also [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). – Parfait Sep 24 '18 at 02:44
  • Ok...Now I have edited the original question...the code above works and produces the results provided. But my question remains...how can I create an empty dataframe with only column headers and add new rows when I do not know the number of rows I need until completed? df_Calcs = pd.DataFrame(columns=cnames) – Hedgebox Sep 24 '18 at 04:05
  • Back to my very first point that your process can certainly be improved without looping on empty df using proper pandas-style vectorized operations. Please post initial sample data, so we can run your code to produce your output. – Parfait Sep 24 '18 at 15:33

1 Answers1

1

You have many ways to do that.

Create empty dataframe:

cnames=['Security', 'Time', 'Vol_21D', 'Vol2_21D', 'MaxAPV_21D', 'MinAPV_21D']
df = pd.DataFrame(columns=cnames)

Output:

Empty DataFrame
Columns: [Security, Time, Vol_21D, Vol2_21D, MaxAPV_21D, MinAPV_21D]
Index: []

Then, in loop you can create a pd.series and append to your dataframe, example:

df.append(pd.Series([1, 2, 3, 4, 5, 6], cnames), ignore_index=True)

Or you can append a dict:

df.append({'Security': 1,
           'Time': 2,
           'Vol_21D': 3,
           'Vol2_21D': 4,
           'MaxAPV_21D': 5,
           'MinAPV_21D': 6
          }, ignore_index=True)

It will be the same output:

  Security Time Vol_21D Vol2_21D MaxAPV_21D MinAPV_21D
0        1    2       3        4          5          6

But I think, more faster and pythonic way: first create an array, then append all raws to array and make data frame from array.

data = []
for i in range(0,5):
    data.append([1,2,3,4,i,6])
df = pd.DataFrame(data, columns=cnames)

I hope it helps.

  • I think your last one will help...I never thought of creating the dataframe after the data was in a list. I will then know the number of entries in the dataframe. That should work perfectly...I will use the while loop I have with the counter & I will let you know. – Hedgebox Sep 24 '18 at 09:39