2

I'm a beginner in Machine Learning and I'm trying to learn through Kaggle's TItanic problem. I've already completed my code and got an accuracy score of 0.78 but now I need to produce a CSV file with 418 entries + a header row but idk how to go about it.

This is an example of what I'm supposed to produce:

PassengerId,Survived
 892,0
 893,1
 894,0
 Etc.

The data comes from my test_predictions

This is my code:

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

"""Assigning the train & test datasets' adresses to variables"""
train_path = "C:\\Users\\Omar\\Downloads\\Titanic Data\\train.csv"
test_path = "C:\\Users\\Omar\\Downloads\\Titanic Data\\test.csv"

"""Using pandas' read_csv() function to read the datasets
and then assigning them to their own variables"""
train_data = pd.read_csv(train_path)
test_data = pd.read_csv(test_path)

"""Using pandas' factorize() function to represent genders (male/female)
with binary values (0/1)"""
train_data['Sex'] = pd.factorize(train_data.Sex)[0]
test_data['Sex'] = pd.factorize(test_data.Sex)[0]

"""Replacing missing values in the training and test dataset with 0"""
train_data.fillna(0.0, inplace = True)
test_data.fillna(0.0, inplace = True)

"""Selecting features for training"""
columns_of_interest = ['Pclass', 'Sex', 'Age']

"""Dropping missing/NaN values from the training dataset"""
filtered_titanic_data = train_data.dropna(axis=0)

"""Using the predictory features in the data to handle the x axis"""
x = filtered_titanic_data[columns_of_interest]

"""The survival (what we're trying to find) is the y axis"""
y = filtered_titanic_data.Survived

"""Splitting the train data with test"""
train_x, val_x, train_y, val_y = train_test_split(x, y, random_state=0)

"""Assigning the DecisionClassifier model to a variable"""
titanic_model = DecisionTreeClassifier()

"""Fitting the x and y values with the model"""
titanic_model.fit(train_x, train_y)

"""Predicting the x-axis"""
val_predictions = titanic_model.predict(val_x)

"""Assigning the feature columns from the test to a variable"""
test_x = test_data[columns_of_interest]

"""Predicting the test by feeding its x axis into the model"""
test_predictions = titanic_model.predict(test_x)

"""Printing the prediction"""
print(val_predictions)

"""Checking for the accuracy"""
print(accuracy_score(val_y, val_predictions))

"""Printing the test prediction"""
print(test_predictions)
petezurich
  • 9,280
  • 9
  • 43
  • 57
Onur-Andros Ozbek
  • 2,998
  • 2
  • 29
  • 78
  • What is the question? How is your solution deficient - what does it do or not do that is incorrect? Are you getting errors/Exceptions? – wwii Sep 19 '18 at 18:20
  • `How to produce a CSV file with Python with specific entries?` – Onur-Andros Ozbek Sep 19 '18 at 18:21
  • 1
    Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. [Minimal, complete, verifiable example](http://stackoverflow.com/help/mcve) applies here. We cannot effectively help you until you post your MCVE code and accurately describe the problem. We should be able to paste your posted code into a text file and reproduce the problem you described. – Prune Sep 19 '18 at 18:22
  • I've edited the question. – Onur-Andros Ozbek Sep 19 '18 at 18:23
  • Possible dupe: [How to write a numpy array to a csv file?](https://stackoverflow.com/q/24659814/2823755) – wwii Sep 19 '18 at 18:29
  • https://docs.python.org/3/library/csv.html – Nick Eu Sep 19 '18 at 18:30
  • 1
    You are usually provided with sample submission file. If you have it as DataFrame, then simply do `submission['Survived'] = test_predictions`. The next line will be creating csv file from pandas' DataFrame. `submission.to_csv('filename.csv', index=False)` – ipramusinto Sep 19 '18 at 21:10
  • Working with keras you get floats which you have to convert: `code predicts= clfm.predict(titanic[predictors], batch_size=batch_size,verbose=1) predictsnsub= [int(numpy.round(i)) for i in predicts]` – Max Kleiner May 01 '20 at 09:53

1 Answers1

4

How about this:

submission = pd.DataFrame({ 'PassengerId': test_data.passengerid.values, 'Survived': test_predictions })
submission.to_csv("my_submission.csv", index=False)
petezurich
  • 9,280
  • 9
  • 43
  • 57