2

I currently have a set of data points (hit counts), which are structured as a time series. The data is something like:

time   hits
20     200
32     439
57     512

How can I fit a curve to this data or find a formula so that I can predict points in the future? Ideally, I can answer a question like "How many views will there be when the time is 100?"

Thanks for your help!

EDIT: What I've tried so far:

I've tried a variety of methods, including:

  1. Creating a Logistic Regression using sklearn (however, there are no features for the data)

  2. Creating a curve fit using optimize.curve_fit from scipy (however, I don't have a function for the data)

  3. Creating a function from a UnivariateSpline to pass into curve_fit (something went wrong, I can't pin it down)

I'm trying to model when content goes viral, so I assume that a polynomial or exponential curve is ideal.

I tried the links from @Bill previously, but I have no function for the data. Do you know how I can find one?

EDIT 2:

Here's a sample of about two days of data: The Fox Data

Here is what is expected over time.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
cheese1756
  • 1,719
  • 3
  • 17
  • 25
  • 1
    A few questions: 1.) What have you tried so far? 2.) What kind of curve are you trying to fit - polynomial? exponential? loglinear? 3.) Have you looked at any documentation or related questions on this site, such as [this](http://stackoverflow.com/questions/19165259/python-numpy-scipy-curve-fitting) or [this](http://stackoverflow.com/questions/8280871/curve-fitting-with-python)? – wflynny May 12 '14 at 16:23
  • Thanks for the comment, @Bill. I've edited the post to include what I've tried so far. – cheese1756 May 12 '14 at 16:31
  • 1
    Without relevant domain knowledge, it would be difficult to tell what model (logistic, linear, ...) to use to fit the data with. – K.Chen May 12 '14 at 17:50
  • 1
    In light of your edits, the real question is: how do I know what kind of curve fits my data? And the answer is, well it varies for all datasets. Your best bet is to try a bunch and see which is the best fit for your data. However, you're not just trying to fit your data, you're using your data to "train" a model which you can use to predict future values. Model training and validation is a huge field, and you're not going to get an easy answer to "which curve fits my data well and additionally predicts data well." – wflynny May 12 '14 at 17:51
  • however, if you post a plot of hits as a function of time we can tell you if there is an obvious answer. – gg349 May 12 '14 at 17:53
  • Thanks for all of your input! I've posted two examples. The first one is short-term, showing the kind of data that I expect to see day-to-day. The second one is long-term, showing the type of trends I expect to eventually see. – cheese1756 May 12 '14 at 21:13

1 Answers1

1

As other people have said it is difficult to give an answer with so few information.

I suggest you to define some new variable like time, time*time, time*time*time and to fit a LinearRegression model using this as input variable.

I will start with these and then in case using something of more complex like neural network (not in sklearn) or SVR.

Hope this can help.

Donbeo
  • 17,067
  • 37
  • 114
  • 188