I am trying to drop some empty rows in my dataframe. The following code shows that the datatypes are indeed sparse.
items_users_sparse_top_tags_df = items_users_sparse_pd.loc[tracks_tags_df.index]
items_users_sparse_top_tags_df.rename_axis('tracks', axis = 'index', inplace = True)
items_users_sparse_top_tags_df.dtypes
and the result:
playlists
37i9dQZF1DX7KNKjOK0o75 Sparse[int64, 0]
37i9dQZF1DWT1y71ZcMPe5 Sparse[int64, 0]
37i9dQZF1DX1tyCD9QhIWF Sparse[int64, 0]
37i9dQZF1DWSXBu5naYCM9 Sparse[int64, 0]
3JwPVKISB9IBlE2RST1MVn Sparse[int64, 0]
0lDMDuxqUYRAHAg2aSB4Mh Sparse[int64, 0]
6JX1W7EUwl28ApynqRIzGd Sparse[int64, 0]
73pA7uClVdMP4UM4NHYkjw Sparse[int64, 0]
7rRuBmh62FSsGh7ymtIUl3 Sparse[int64, 0]
2moEpTGsu9XpWjc7DMCgH6 Sparse[int64, 0]
Length: 3990, dtype: object
When I try to remove the users that are empty (as rows after the transpose), the dtype is being changed. The code:
users_items_sparse_dropped = items_users_sparse_top_tags_df.T[(items_users_sparse_top_tags_df !=0).any()]
the dtypes:
tracks
2res3Ptlahsu1kh5XtFhu4 object
4UGxnxGlpc7BB8Cbu8vITC object
63diy8Bzm0pHMAU37By2Nh object
6wBHYoPsAqS88OwfjCvlaq object
1aoaegj0Bv8p1N6dWyCDbr object
...
2IH4PRZxA3W6sIWcFU0GKZ object
2JKlf0IYz5oWsT3OCLyjpO object
0fa2P8krhE1K19MUUh0meb object
2CM7CAL7aJ5WkPU0oGbA96 object
0w2U0uERbUTJMNIKdTSUkj object
Length: 15679, dtype: object
While the code indeed removes the empty users-as-rows, I would prefer to keep the dataframe sparse so I do not have to transform it again.
The reasoning behind using sparse dataframes and not directly scipy sparse formats is keeping the IDs as indexes and not messing up during data manipulation etc.