How to produce a Kaggle submission CSV file with specific entries?

Question

I'm a beginner in Machine Learning and I'm trying to learn through Kaggle's TItanic problem. I've already completed my code and got an accuracy score of 0.78 but now I need to produce a CSV file with 418 entries + a header row but idk how to go about it.

This is an example of what I'm supposed to produce:

PassengerId,Survived
 892,0
 893,1
 894,0
 Etc.

The data comes from my test_predictions

This is my code:

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

"""Assigning the train & test datasets' adresses to variables"""
train_path = "C:\\Users\\Omar\\Downloads\\Titanic Data\\train.csv"
test_path = "C:\\Users\\Omar\\Downloads\\Titanic Data\\test.csv"

"""Using pandas' read_csv() function to read the datasets
and then assigning them to their own variables"""
train_data = pd.read_csv(train_path)
test_data = pd.read_csv(test_path)

"""Using pandas' factorize() function to represent genders (male/female)
with binary values (0/1)"""
train_data['Sex'] = pd.factorize(train_data.Sex)[0]
test_data['Sex'] = pd.factorize(test_data.Sex)[0]

"""Replacing missing values in the training and test dataset with 0"""
train_data.fillna(0.0, inplace = True)
test_data.fillna(0.0, inplace = True)

"""Selecting features for training"""
columns_of_interest = ['Pclass', 'Sex', 'Age']

"""Dropping missing/NaN values from the training dataset"""
filtered_titanic_data = train_data.dropna(axis=0)

"""Using the predictory features in the data to handle the x axis"""
x = filtered_titanic_data[columns_of_interest]

"""The survival (what we're trying to find) is the y axis"""
y = filtered_titanic_data.Survived

"""Splitting the train data with test"""
train_x, val_x, train_y, val_y = train_test_split(x, y, random_state=0)

"""Assigning the DecisionClassifier model to a variable"""
titanic_model = DecisionTreeClassifier()

"""Fitting the x and y values with the model"""
titanic_model.fit(train_x, train_y)

"""Predicting the x-axis"""
val_predictions = titanic_model.predict(val_x)

"""Assigning the feature columns from the test to a variable"""
test_x = test_data[columns_of_interest]

"""Predicting the test by feeding its x axis into the model"""
test_predictions = titanic_model.predict(test_x)

"""Printing the prediction"""
print(val_predictions)

"""Checking for the accuracy"""
print(accuracy_score(val_y, val_predictions))

"""Printing the test prediction"""
print(test_predictions)

What is the question? How is your solution deficient - what does it do or not do that is incorrect? Are you getting errors/Exceptions? — wwii, Sep 19 '18 at 18:20
`How to produce a CSV file with Python with specific entries?` — Onur-Andros Ozbek, Sep 19 '18 at 18:21
Please read and follow the posting guidelines in the help documentation, as suggested when you created this account. [Minimal, complete, verifiable example](http://stackoverflow.com/help/mcve) applies here. We cannot effectively help you until you post your MCVE code and accurately describe the problem. We should be able to paste your posted code into a text file and reproduce the problem you described. — Prune, Sep 19 '18 at 18:22
Possible dupe: [How to write a numpy array to a csv file?](https://stackoverflow.com/q/24659814/2823755) — wwii, Sep 19 '18 at 18:29
You are usually provided with sample submission file. If you have it as DataFrame, then simply do `submission['Survived'] = test_predictions`. The next line will be creating csv file from pandas' DataFrame. `submission.to_csv('filename.csv', index=False)` — ipramusinto, Sep 19 '18 at 21:10
Working with keras you get floats which you have to convert: `code predicts= clfm.predict(titanic[predictors], batch_size=batch_size,verbose=1) predictsnsub= [int(numpy.round(i)) for i in predicts]` — Max Kleiner, May 01 '20 at 09:53

score 4 · Accepted Answer · answered Sep 19 '18 at 19:37

4

How about this:

submission = pd.DataFrame({ 'PassengerId': test_data.passengerid.values, 'Survived': test_predictions })
submission.to_csv("my_submission.csv", index=False)

answered Sep 19 '18 at 19:37

petezurich

9,280
9
43
57

How do I limit it to 418 entries? – Onur-Andros Ozbek Sep 24 '18 at 01:41
Try `test_data.passengerid.values[:418]` and `test_predictions[:418]` – petezurich Sep 24 '18 at 06:53
1

Thank you. I've accepted and upvoted your question. If you think that this was a well asked question, could you give me an upvote? – Onur-Andros Ozbek Oct 24 '18 at 18:54
1

@OnurOzbek Thanks. And sure – I have already done so. You had a downvote before that. So right now this equals to null... – petezurich Oct 24 '18 at 20:09

How to produce a Kaggle submission CSV file with specific entries?

1 Answers1