0

Trying to make a scatter plot with a pandas dataframe, but "ValueError: x and y must be the same size" kept popping up. Looks like Slaughter Steers data column are strings instead of floats so try to convert it, but ValueError: could not convert string to float: '1,062.6' happens. Tried to replace ' with a space still same error.

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np



#Read in Data set date as index
cattle_price = pd.read_csv('C:/Users/SkyLH/Documents/CattleForcast Model/Slaughter Cattle Monthly Data.csv', index_col = 'DATE')
cattle_slaughter = pd.read_csv('C:/Users/SkyLH/Documents/Cattle Forcast Model/SlaughterCountsFull - Sheet1.csv', index_col = 'Date')
cattle_price.index = pd.to_datetime(cattle_price.index)
cattle_price.index.names = ['Date',]


cattle_slaughter.replace("'"," ")
cattle_slaughter.astype(float)

cattle_df = cattle_price.join(cattle_slaughter, how = 'inner')

print(cattle_df)
plt.scatter(cattle_df, y = 'Price')
plt.show()

                Price Slaughter Steers
Date                                  
1955-01-01  34.899999            983.8
1955-02-01  35.999998            847.9
1955-03-01  34.600001          1,062.6
1955-04-01  35.800002          1,000.9
1955-05-01  33.100002          1,090.1
Skylee
  • 11
  • 1
  • try specifying the x and y values in `plt.scatter`, something like `plt.scatter(cattle_df.index, cattle_df.Price)` – Khalil Al Hooti Oct 11 '18 at 00:46
  • First, you aren't assigning the replacement back to anything, so you're never actually changing anything. Second, you should use `thousands=','` in your `pd.read_csv` lines, that way it reads the numbers properly and removes the commas. – ALollz Oct 11 '18 at 01:06

1 Answers1

0

Believe the commas (thousands separators) are preventing the conversion. This question has possible solutions that may help you:

How do I use Python to convert a string to a number if it has commas in it as thousands separators?

Whip
  • 133
  • 2
  • 10
  • So I got rid of the commas in the pd.read_csv line, and .describe says that the count for both columns is 764 still getting ValueError: x and y must be the same size. – Skylee Oct 11 '18 at 02:18