2

Until recently, it was possible to generate sample dataframes in Pandas using functionality of pd.util.testing module:

In [22]: import pandas as pd

In [23]: pd.util.testing.makeMixedDataFrame()
Out[23]:
     A    B     C          D
0  0.0  0.0  foo1 2009-01-01
1  1.0  1.0  foo2 2009-01-02
2  2.0  0.0  foo3 2009-01-05
3  3.0  1.0  foo4 2009-01-06
4  4.0  0.0  foo5 2009-01-07

(see https://stackoverflow.com/a/65592210/22084711 for more examples)

However, pd.util.testing is being deprecated. As far as I can tell, this deprecation is in favor of pd.testing. It does not include any of the functionality used for generating sample dfs (makeMixedDataFrame, makeMissingDataframe, etc.).

Is this functionality being transferred to some other module? I looked but couldn't find anywhere else. I'd like to have an alternative that comes with Pandas and does not require additional dependencies like Seaborn, or downloading the dataframe from somewhere else.

(I was going to ask on pandas' Github, but they require that all questions are being asked on SO first.)

RMal
  • 23
  • 3

1 Answers1

1

Actually, there is two different testing modules (if we can say so). An official one (which is documented in the API with only four available functions as of 2.0.0+) and a second one (for internal use).

So, I guess you're looking for the latter (i.e pandas._testing) :

import pandas as pd
#pd.__version__ #2.0.2

df = pd._testing.makeMixedDataFrame()

Output :

print(df)

     A    B     C          D
0  0.0  0.0  foo1 2009-01-01
1  1.0  1.0  foo2 2009-01-02
2  2.0  0.0  foo3 2009-01-05
3  3.0  1.0  foo4 2009-01-06
4  4.0  0.0  foo5 2009-01-07
Timeless
  • 22,580
  • 4
  • 12
  • 30
  • This seems very correct, thank you! I have not built python modules before with complexity on the level of Pandas, so one question for clarification. Do I understand correctly that relying on internal modules is generally not a good idea for non-ad-hoc use? E.g. I can imagine that if this is internal, users would not get deprecation notices, or that this functionality can go away without any deprecation period at all. – RMal Jun 16 '23 at 15:59
  • Yes, I think that relying on internal modules is generally not recommended for *non-ad-hoc* use. But that being said, you may consider opening an [issue](https://github.com/pandas-dev/pandas/issues) for more details/accurate insights from the core developers. – Timeless Jun 16 '23 at 16:07