0

I am trying to figure out how to get x values based on y value from panda data series and matplolib.

To be more precise I need to get x value when y=0.5 from multiple columns. The data is normalized and cut according to user inputs.

I don't have enough data points to get precise 0.5 value (the nearest might be 0.4 or 0.6)

I though maybe it is possible to draw line at 0.5 value and get intersection points or somehow interpolate data, but I do not realy know how to do it properly.

Maybe someone has some suggestions?

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('test.csv', header=0, sep=',' )  

colnames = list(df.columns)
print(colnames)
colnames.pop(0)
for i in colnames:

df[i]= (df[i] - df[i].min()) / (df[i].max() - df[i].min())



print(df)
df.plot(x='Temperature')
plt.show()

y = input('Enter temperature')
d1 = df[df['Temperature'] >= int(y)]
a = input('Second temperature')
d2 = d1[d1['Temperature'] <= int(a)]

colnames2 = list(d2.columns)
for i in colnames2:
df[i] = (df[i] - df[i].min()) / (df[i].max() - df[i].min())
main = d2.plot(x='Temperature')
line = plt.axhline(y=0.5, color='black', linestyle='-')

plt.show()

p1 = d2.interpolate()

1 Answers1

0

EDIT: I realised you're trying to predict x from y, not y from x, so I've swapped my variables around below. The problem is similar.

The best way of doing this depends on what the data relates to, how it is distributed, etc. However, one approach would be to train a linear model based on the data and then use the model to estimate the value of x when y equals 0.5.

For example, you could use one of the linear models offered by scikit-learn:

import pandas as pd
from sklearn import linear_model
import numpy as np
from io import StringIO

data = """y,x
0.1,4
0.2,8
0.3,12
0.4,16
0.6,24
"""

df = pd.read_csv(StringIO(data))

x = df[['x']]
y = df[['y']]

# Create linear regression object
regr = linear_model.LinearRegression()

# Train the model using the available data
regr.fit(y, x)

# Predict for the missing value
interpolated = regr.predict(np.array([[0.5]]))

print(float(interpolated))

...or you can fit a more sophisticated model based on what your data is like.

  • Thanks, my model is not linear, but do you think it would be better if I would try to fit my data (Boltzman should do the trick) and get value from fit? – Darius Šulskis May 23 '18 at 21:16
  • That seems as sensible as anything! Another plan is to take the two Y values nearest to the one you are trying to interpolate and taking the average of those. This would be suitable for nonlinear data as long as the Y values you have are sufficiently near the one you're trying to interpolate. – Nicholas James Bailey May 23 '18 at 21:28
  • Great, I was also thinking about taking two data points. I will look into. – Darius Šulskis May 23 '18 at 21:37