How do you create a Kolmogorov-Smirnov chart in python for a 2 sample string lists?

Question

I am having difficulty in creating a Kolmogorov-Smirnov chart for 2 sample string lists that will display Cumulative Distribution Function (CDF)?

As shown in Two-sample Kolmogorov–Smirnov test

I have been able to calculate the Ks_2sampResult(statistic=0.12939662567915355, pvalue=0.4183080902726968) for the overall string lists however the difficulty is how to chart this that will display the Cumulative Frequency Distribution (CFS)

Data List Samples

Subset of List 1 e.g. ['team', 'new', 'estate', 'ho', 'ur', 'la', 'pak', 'ebay', 'biz', 'best']

Subset of List 2 e.g. ['ilsilenzio', 'stilllife', 'mathiasboe', 'achininimeshikaratnasiri', 'andrewdabeka', 'davekhodabux', 'lilytermetz', 'marianhorsley', 'lindacloutier', 'moniquehoogland',]

Get the data lists

    def getWikidataList():
    with open('./file1.csv', 'r') as f:
        read = csv.reader(f) 
        next(read)
        for line in read:
            lineitem = line[0].split()[0]
            item1 = process_text(lineitem)
            wikidatalist.append(item1[0])
        return wikidatalist


def getTwitterList1():
    with open('file2.csv', 'r') as f:
        read = csv.reader(f) 
        next(read)
        for line in read:
            lineitem = line[0].split()[0]
            item1 = process_text(lineitem)
            twitterdatalist.append(item1[0])
        return twitterdatalist



wikidatalist = getWikidataList()
twitterData = getTwitterList1()

Plot the CDFs

def ks_plot_comp(data_1, data_2):

    plt.figure(figsize=(12, 7))
    plt.plot(data_1, np.linspace(0, 1, len(data_1), endpoint=False))
    plt.plot(data_2, np.linspace(0, 1, len(data_2), endpoint=False))
    plt.legend('top right')
    plt.legend(['wikidata', 'twitterdata'])
    plt.title('Comparing 2 CDFs for KS-Test')
    plt.show()    
    ks_plot_comp(wikidatalist, twitterData)

I just reviewed an edit to this question. I recognized, that there is still a formatting issue in the last function. Can anyone fix that please? (I can't at the moment, because the edit is still pending and I already approved it) — jottbe, Sep 03 '19 at 07:26
Hi Zoe & ASh, Thanks so much for correcting my mistake.I can't seem to post the image as my reputation has been reduced to a 7. It was one mistake... 4 points .... really? — Polly, Sep 03 '19 at 18:49
Maybe you should post your image now ;-) Sorry, I don't know why there are so may people who like to downvote here. For some it seems to be a hobby. — jottbe, Sep 03 '19 at 18:55
No problem, sorry that I can't help you with your original problem. — jottbe, Sep 03 '19 at 19:08

How do you create a Kolmogorov-Smirnov chart in python for a 2 sample string lists?

0 Answers0