3

I'm appending data from a list to pandas df. I keep getting NaN in my entries.

Based on what I've read I think I might have to mention the data type for each column in my code.

dumps = [];features_df = pd.DataFrame()
for i in range (int(len(ids)/50)): 
    dumps = sp.audio_features(ids[i*50:50*(i+1)])
    for i in range (len(dumps)):
        print(list(dumps[0].values()))
        features_df = features_df.append(list(dumps[0].values()), ignore_index = True)

Expected results, something like-
[0.833, 0.539, 11, -7.399, 0, 0.178, 0.163, 2.1e-06, 0.101, 0.385, 99.947, 'audio_features', '6MWtB6iiXyIwun0YzU6DFP', 'spotify:track:6MWtB6iiXyIwun0YzU6DFP', 'https://api.spotify.com/v1/tracks/6MWtB6iiXyIwun0YzU6DFP', 'https://api.spotify.com/v1/audio-analysis/6MWtB6iiXyIwun0YzU6DFP', 149520, 4] for one row. Actual-
danceability energy ... duration_ms time_signature
0 NaN NaN ... NaN NaN
1 NaN NaN ... NaN NaN
2 NaN NaN ... NaN NaN
3 NaN NaN ... NaN NaN
4 NaN NaN ... NaN NaN
5 NaN NaN ... NaN NaN

For all rows

  • A complete working example will get you the fastest answer https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – Rich Andrews Mar 30 '19 at 02:22

1 Answers1

2

append() strategy in a tight loop isn't a great way to do this. Rather, you can construct an empty DataFrame and then use loc to specify an insertion point. The DataFrame index should be used.

For example:

import pandas as pd
  
df = pd.DataFrame(data=[], columns=['n'])
for i in range(100):
    df.loc[i] = i
print(df)
time python3 append_df.py 
   n
0  0
1  1
2  2
3  3
4  4
5  5
6  6
7  7
8  8
9  9

real    0m13.178s
user    0m12.287s
sys 0m0.617s

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.append.html

Iteratively appending rows to a DataFrame can be more computationally intensive than a single concatenate. A better solution is to append those rows to a list and then concatenate the list with the original DataFrame all at once.

Rich Andrews
  • 1,590
  • 8
  • 12
  • 1
    Can't add to the upvotes cuz I'm a newbie but this is worked. Thanks a lot! – the.lotuseater Mar 31 '19 at 02:47
  • *"append() strategy in a tight loop isn't a great way to do this"* does anyone know why? – PJ_ Sep 11 '21 at 14:17
  • 1
    @PedroMartinez pandas is optimized for vectorized operations. The .loc[] provides better precision on editing the df and helps make it clear what is going to happen. Editing answer to update w/ latest pandas dataframe.append() doc. – Rich Andrews Nov 02 '21 at 17:28