Questions tagged [zipf]

Zipf's law (/ˈzɪf/) is an empirical law formulated using mathematical statistics that refers to the fact that many types of data studied in the physical and social sciences can be approximated with a Zipfian distribution, one of a family of related discrete power law probability distributions.

The law is named after the American linguist George Kingsley Zipf (1902–1950), who popularized it and sought to explain it, though he did not claim to have originated it.
Source: https://en.wikipedia.org/wiki/Zipf%27s_law

25 questions
6
votes
2 answers

curve fitting zipf distribution matplotlib python

I tried to fit the following plot(red dot) with the Zipf distribution PDF in Python, F~x^(-a). I simply chose a=0.56 and plotted y = x^(-0.56), and I got the curve shown below. The curve is obviously wrong. I don't know how to do the curve…
manxing
  • 3,165
  • 12
  • 45
  • 56
4
votes
1 answer

Zipf Distribution based number generation

I want to generate a popularity distribution for a small data set, which should follow Zipf law. The available parameters are: Total number of viewers : 1 Million Total number of videos : 36 I want to associate total number of viewers to each…
yours
  • 51
  • 1
3
votes
1 answer

Zipf Distribution: How do I measure Zipf Distribution

How do I measure or find the Zipf distribution ? For example, I have a corpus of english words. How do I find the Zipf distribution ? I need to find the Zipf ditribution and then plot a graph of it. But I am stuck in the first step which is to find…
RDM
  • 1,136
  • 3
  • 28
  • 50
3
votes
1 answer

Zipf Distribution: How do I measure Zipf Distribution using Python / Numpy

I have a file (lets say corpus.txt) of around 700 lines, each line containing numbers separated by -. For example: 86-55-267-99-121-72-336-89-211 59-127-245-343-75-245-245 First I need to read the data from the file, find the frequency of each…
RDM
  • 1,136
  • 3
  • 28
  • 50
2
votes
2 answers

How to find frequency of ten most common words in a file?

I'm writing a function on Python that takes the name of a text file (as a string) as input. The function should first determine how many times each word appears in the file. Later, I will make a bar chart showing the frequency of the ten most common…
Stiff
  • 47
  • 7
2
votes
1 answer

How to use correctly ZipfDistribution from Apache commons math library in Java?

I want to create a source of data (in Java) based on words (from a dictionary) that follow a Zipf distribution. So I come to ZipfDistribution and NormalDistribution of the Apache commons library. Unfortunately, information about how to use these…
Felipe
  • 7,013
  • 8
  • 44
  • 102
2
votes
1 answer

How to calculate the optimal zipf distribution of word frequencies in a text

for a homework assignment i have to plot the word frequencies of a text and compare it to an optimal zipf distribution. Plotting the counted word frequencies of the text according to their rank in a log log graph seems to work fine. But i have…
Dafuq
  • 25
  • 4
2
votes
1 answer

What does the parameters in scipy.stats.zipf mean?

From the docs The probability mass function for zipf is: zipf.pmf(k, a) = 1/(zeta(a) * k**a) for k >= 1. zipf takes a as shape parameter. The probability mass function above is defined in the “standardized” form. To shift distribution use the loc…
alvas
  • 115,346
  • 109
  • 446
  • 738
2
votes
1 answer

Constructing Zipf Distribution with matplotlib, FITTED-LINE

I have a list of paragraphs, where I want to run a zipf distribution on their combination. My code is below: from itertools import * from pylab import * from collections import Counter import matplotlib.pyplot as plt paragraphs = "…
AlpU
  • 363
  • 1
  • 9
  • 26
1
vote
0 answers

Generating data of fixed size from a given set using Zipf probability distribution

I need to generate data sets from a given set using discrete probability distributions like zipf, geometric etc. For example, say we are given a set of elements A=(1,2,3,4,5), I need to generate a data set of size 100 such that; -data set consist of…
1
vote
1 answer

How to choose interpolation points to reduce maximum error for inverse CDF lookup

QUESTION: How do I pick interpolation points that keep the maximum error for any point in each interpolated segment within a specified bound? The goal is to shape a random distribution according to Zipf's law using inverse transform sampling. I have…
Paul Chernoch
  • 5,275
  • 3
  • 52
  • 73
1
vote
1 answer

Plotting a "perfect" Zipf distribution from data on gnuplot

My goal is to have a simple .dat file and, from it, to plot both the actual data and the theoretical points of a perfect Zipf distribution, that is, a distribution where every item has a value equal to 1/(rank). For instance, my data for most…
Andycyca
  • 13
  • 4
0
votes
0 answers

How do I generate numbers according to zipf's law in C++?

For an array arrayType arrayName[size] = . . ., I want a function arrayType sample() { . . . } to return items from arrayName according to zipf's law (item in position n should have a frequency of m/n where m is the frequency of the first and most…
0
votes
2 answers

dot net core upload zip file - empty file

I have a post to upload file and it is ok for pdf, txt, docx. When I try to upload zip file, I can't extract the files because the file is empty. Some ideas? Thanks in avance. public retOp UploadFile(IFormFile file) { try { string PathFile =…
maccarilab
  • 11
  • 1
0
votes
1 answer

NIFI - upload binary.zip to SQL Server as varbinary

I am trying to upload a binary.zip to SQL Server as varbinary type column content. Target Table: CREATE TABLE myTable ( zipFile varbinary(MAX) ); My NIFI Flow is very simple: -> GetFile: filter:binary.zip -> UpdateAttribute:
Leonardo
  • 83
  • 1
  • 3
1
2