1

I am relatively new to Python, and when I am doing homework, I met the following Problems.

This is for a new starter for tensorflow and panda

zerlite_13X_error = pd.read_csv("zerlite_13x_error.csv", sep=",")
def preprocess_features(zerlite_13X_error):
    """Prepares input features from zerlite_13X_error
    Args:
    zerlite_13X_error: A Pandas DataFrame expected to contain data

    Return:
    A DataFrame that contains the features to be used for the model.
including synthetic features
    """
    selected_features = zerlite_13X_error[
        ["Parameter 1",
         "Parameter 2",
         "Parameter 3",
         "Parameter 4",
         "Parameter 5",
         "Parameter 6",
         "Parameter 7",
         "Parameter 8"]]
    processed_features = selected_features.copy()
    print(processed_features.head())
    return processed_features

preprocess_features(zerlite_13X_error)

KeyError: "['Parameter 7', 'Parameter 8', 'Parameter 2', 'Parameter 3', 'Parameter 4', 'Parameter 5', 'Parameter 6'] not in index" in preprocess_features(zerlite_13X_error)

shahaf
  • 4,750
  • 2
  • 29
  • 32
yulin wang
  • 11
  • 1
  • 3
  • it seems the columns of your df do not contain e.g. 'Parameter 1' etc. - what does `print(zerlite_13X_error.columns.values)` tell you? And by the way, maybe [this](https://stackoverflow.com/questions/11285613/selecting-multiple-columns-in-a-pandas-dataframe) Q&A is helpful for what you want to do. – FObersteiner Oct 05 '19 at 09:36
  • Thanks mate, I have got the right answer! 'print(zerlite_13X_error.columns.values)' really helps ! – yulin wang Oct 05 '19 at 09:47

2 Answers2

2

There are 2 most intuitive approaches to your problem:

Approach 1

Open the source file with any text editor and look at the first row. It should contain column names, separated with spaces. In your case it should be something like:

Parameter 1,Parameter 2,Parameter 3,Parameter 4,Parameter 5,Parameter 6,Parameter 7,Parameter 8

(and maybe some other columns).

Approach 2

Just after read_csv add:

print(zerlite_13X_error.columns)

This printout will show column names of the DataFrame just read.

In either case

Take a look at the list of column names. It should contain every column from "your" list.

If some columns are missing, correct the title row accordingly and run your program again.

Another possible source of error can be an extra spaces after commas (or at the beginning of this row). Unfortunately, read_csv is not clever enough to filter out such spaces. This function just splits the title row on the separator char (in this case a comma) and each resulting "segment" becomes the name of respective column.

In the above case, these extra spaces become initial chars in column names.

Yet another possible source of error is that the source file does not contain any title row. In this case you should pass to read_csv your own list of column names (names parameter) it the order corresponding to actual content of the input file.

Note also that sep=',' is not needed, as ',' is just the default value of this parameter. So, according to Keep It Simple rule, avoid passing any parameters with their default values.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
0

All the Columns have a space "Parameter 1", we have an extra space in "Parameter 1". We need to get rid of it:

df = pd.read_csv(fileName)
df = df.rename({"Parameter 1": "Parameter_1"}, axis=1)
S.B
  • 13,077
  • 10
  • 22
  • 49