0

I've written the following simple code to illustrate an issue I'm having:

import pandas as pd

symbols = {'TEST1', 'TEST2', 'TEST3'}

i = 0

for sym in symbols:
    if i == 0:
        cum_df = pd.DataFrame([sym])
        i = 1
    else:
        cum_df.append(pd.DataFrame([sym]), ignore_index=True)

cum_df

I was expecting cum_df to look like this:

+---+-------+
|   |   0   |
+---+-------+
| 0 | TEST1 |
| 1 | TEST2 |
| 2 | TEST3 |
+---+-------+

But instead it looks like this:

+---+-------+
|   |   0   |
+---+-------+
| 0 | TEST3 |
+---+-------+

Where am I going wrong?

Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
Jossy
  • 589
  • 2
  • 12
  • 36
  • 2
    Method append on a dataframe does not work as with list, you need to reassign it, so in your else condition, try with `cum_df = cum_df.append(pd.DataFrame([sym]), ignore_index=True)`, but it is not a good practice to build a dataframe this way – Ben.T May 21 '20 at 18:21
  • Thanks - this worked however it seems to take a while (relative to a simple loop) and then outputs the df in TEST2, TEST3, TEST1. Any ideas why? What would be a better way to build a dataframe from a loop? In reality I'm extracting ```symbols``` from a database and then using these in a loop I query the database for data that I then extract and build a combined dataframe from. – Jossy May 22 '20 at 07:40
  • The best practice is to append your dataframes you query in a list like `l = []` and `for sym in symbols: l.append(pd.DataFrame([sym]))`, and then outside of the loop for do `cum_df = pd.concat(l)` if you can do it in a list comprehension, check the last example of this [link](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.append.html) with the concat method – Ben.T May 22 '20 at 12:23
  • 1
    Yep - that link was great! Thanks – Jossy May 22 '20 at 15:27

1 Answers1

1

Do this:

In [1504]: symbols = {'TEST1', 'TEST2', 'TEST3'}
In [1506]: df = pd.DataFrame(symbols)           

In [1507]: df 
Out[1507]: 
       0
0  TEST1
1  TEST3
2  TEST2

If you want to assign column names, you can do:

In [1509]: df = pd.DataFrame(symbols, columns=['Col1'])
In [1510]: df                                          
Out[1510]: 
    Col1
0  TEST1
1  TEST3
2  TEST2
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58
  • Thanks but I wasn't looking for a shortcut to achieve the outcome - the actual code I'm dealing with is more complicated involving extracting data from multiple sources in a database. See my comment to @Ben.T above – Jossy May 22 '20 at 07:41