2

I have two csv files each of them has one column. That column has shared information between them like PassengerId,Name,Sex,Age. etc.

I am trying to draw a graph box plot of the ages of the passengers distribution per title(Mr, Mrs etc.). I get an error. how to pass the error that the plot can be drawn ?

import csv as csv
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
csv_file_object = csv.reader(open('test.csv', 'r')) 

header = next(csv_file_object)
data=[] 

for row in csv_file_object:
    data.append(row)
data = np.array(data) 

csv_file_object1 = csv.reader(open('train.csv', 'r')) 
header1 = next(csv_file_object1) 
data1=[] 

for row in csv_file_object:
    data1.append(row)
data1 = np.array(data1)


Mergerd_file = header.merge(header1, on='PassengerId')

df = pd.DataFrame(Mergerd_file, index=['pAge', 'Tilte'])

df.T.boxplot(vert=False)
plt.subplots_adjust(left=0.25)
plt.show()

I get error this error

  ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-23-0d7fafc1fcf9> in <module>()
     21 
     22 
---> 23 Mergerd_file = header.merge(header1, on='PassengerId')
     24 
     25 df = pd.DataFrame(Mergerd_file, index=['pAge', 'Tilte'])

AttributeError: 'list' object has no attribute 'merge'
  • Just for note Python 2 doesn't complain abou tthat, but does complain about "AttributeError: '_csv.reader' object has no attribute 'merge'" later on. – doctorlove Dec 22 '16 at 13:37
  • 2
    Well this has nothing to do with boxplot in pandas. Btw, if you use pandas, then use directly `pd.read_csv()` to import your dataframe, then `pd.concat` and use `seaborn` to plot the boxplot. If your question is more on how to use the csv library, remove all the unnecessary part, or ask a separate question and make this one clearer – jrjc Dec 22 '16 at 13:48
  • My aim s to do box plot of the ages of the passengers distribution per title using pandas jrjc –  Dec 22 '16 at 13:57
  • I am a bit confused - in csv is no column `title`, do you think `Sex` column? – jezrael Dec 22 '16 at 13:59

2 Answers2

2

The code you're using is for Python 2, yet you're running Python 3. In Python 3 (and recommended in Python 2.6+), the proper way to advance iterator is to use

header = next(csv_file_object1)

Furthermore, the file should be opened in text mode 'r', not 'rb'.

2

I think you need read_csv first, then concat both DataFrames and last create boxplot:

df1 = pd.read_csv('el/test.csv')
print (df1.head())

df2 = pd.read_csv('el/train.csv')
print (df2.head())

df = pd.concat([df1, df2])
df['Title'] = df.Name.str.extract(', (.*)\.', expand=False)
print (df.head())

df[['Age','Title']].boxplot(vert=False, by='Title')
plt.subplots_adjust(left=0.25)
plt.show()
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • There is no separate column for the title. However, under the name there is e.g Braund, Mr. Owen Harris so its the Mr –  Dec 22 '16 at 14:08
  • Ok, you can try yourself, you can also check this [answer](http://stackoverflow.com/q/33573408/2901002) – jezrael Dec 22 '16 at 14:27
  • Thank you so much for you answer :) –  Dec 22 '16 at 14:29
  • Thank you for accepting. Small advice if you post some question in future - check [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for better questions. Good luck! – jezrael Dec 22 '16 at 14:31
  • I tried to exclude the other titles than Dr','Mrs','Mr' ,'Sir using df['Title']= df.Name.str.extract(', (.*)\.', expand=False).isin(['Dr','Mrs','Mr' ,'Sir']) however i get onlt true and flase ? –  Dec 22 '16 at 18:38
  • Need booelan indexing like `df = df[df.Title.isin(['Dr','Mrs','Mr' ,'Sir'])]` after extracting in new line. – jezrael Dec 22 '16 at 19:13