1

I am trying to create a function that will take a numpy dstr name as an argument and plot a histogram of random data points from that distribution.

if it only works on npy distributions that require 1 argument that is okay. Just really stuck trying to create the np.random.distribution()... \

# import libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline 

#Define a function (Fnc) that produces random numpy distributions (dstr)
#Fnc args: [npy dstr name as lst of str], [num of data pts]
def get_rand_dstr(dstr_name):
  npdstr = dstr_name
  dstr = np.random.npdstr(user.input("How many datapoints?"))
  #here pass each dstr from dstr_name through for loop
  #for loop will prompt user for required args of dstr (nbr of desired datapoints)
  return plt.hist(df)

get_rand_dstr('chisquare')
Candy
  • 21
  • 6
  • Any particular reason you don't just have your function take `np.random.chisquare` as an argument instead of a `'chisquare'` string? – user2357112 Nov 21 '19 at 04:12
  • @user2357112-supports-monica Other than it didn't come to mind (lol)... I'm new to statistics so I was hoping to write a function that would allow me to quickly visualize/compare different distributions. I'm also new to python so I wanted it as close to plain text as possible. But I'm going to give it a try because that seems like a great straight forward solution. thank you for your input! I'll comment how it goes. – Candy Nov 21 '19 at 04:37

2 Answers2

1

The accepted answer is incorrect and does not work. The problem is that NumPy's random distributions take different required arguments, so it's a little fiddly to pass size to all of them because it's a kwarg. (That's why the example in the accepted solution returns the wrong number of samples — only 1, not the 5 that were requested. That's because the first argument for chisquare is df, not size.)

It's common to want to invoke functions by name. As well as not working, the accepted answer uses eval() which is a common suggested solution to the issue. But it's generally accepted to be a bad idea, for various reasons.

A better way to achieve what you want is to define a dictionary that maps strings representing the names of functions to the functions themselves. For example:

import numpy as np
%matplotlib inline 
import matplotlib.pyplot as plt

DISTRIBUTIONS = {
    'standard_cauchy': np.random.standard_cauchy,
    'standard_exponential': np.random.standard_exponential,
    'standard_normal': np.random.standard_normal,
    'chisquare': lambda size: np.random.chisquare(df=1, size=size),
}

def get_rand_dstr(dstr_name):
    npdstr = DISTRIBUTIONS[dstr_name]
    size = int(input("How many datapoints?"))
    dstr = npdstr(size=size)
    return plt.hist(dstr)

get_rand_dstr('chisquare')

This works fine — for the functions I made keys for. You could make more — there are 35 I think — but the problem is that they don't all have the same API. In other words, you can't call them all just with size as an argument. For example, np.random.chisquare() requires the parameter df or 'degrees of freedom'. Other functions require other things. You could make assumptions about those things and wrap all of the function calls (like I did, above, for chisquare)... if that's what you want to do?

Matt Hall
  • 7,614
  • 1
  • 23
  • 36
  • 1
    yes, this is the most workable code. The way of organizing different distributions in a single dictionary is really appreciating. good work boss, i learned how to organize a code in fruitful though i know the concept. – Joy Nov 25 '19 at 04:49
1

Use this code, it might be helped youbelow image shows the value and plot

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline 

def get_rand_dstr(dstr_name):
#     npdstr = dstr_name  
    dstr = 'np.random.{}({})'.format(dstr_name, (input("How many datapoints?"))) # for using any distribution need to manipulate here 
                                                                                 # cause number of args are diffrent for diffrent distibutions           
    print(dstr)
    df = eval(dstr)
    print(df)
#     dstr1 = np.random.chisquare(int(input("How many datapoints?")))
#     print(dstr1)
    return plt.hist(df)

# get_rand_dstr('geometric')  
get_rand_dstr('chisquare')
Joy
  • 89
  • 2
  • !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! OMG it works like a DREAM. You are my hero, and probably several of my classmates heroes as well. Thank you! – Candy Nov 21 '19 at 17:28
  • No, I'm afraid this does not work at all. The first argument for `chisquare` is `df`, not `size`. I explained this in my answer. So in the example shown here, you are not getting a distribution, you are getting a single data point from the distribution with `df=1`. It's also considered a bad idea to use `eval()` at all, see https://stackoverflow.com/questions/1832940/why-is-using-eval-a-bad-practice. – Matt Hall Nov 21 '19 at 21:32
  • @Candy This solution does not work. I updated my answer to explain more about it. – Matt Hall Nov 23 '19 at 15:53