1

The Scenario

I have 2 CSV files (1) u.Data and (2) prediction_matrix which I need to read and write into a Single Dataframe, once done it is processed for Clustering based on int / float values it will contain

The Problem

I'm done combining the 2 CSVs into 1 Dataframe named AllData.csv, but the type of columns holding value have a different type now (object), as shown below (with a warning)

sys:1: DtypeWarning: Columns (0,1,2) have mixed types. Specify dtype option on import or set low_memory=False.
UDATA -------------
uid    int64
iid    int64
rat    int64
dtype: object
PRED_MATRIX -------
uid      int64
iid      int64
rat    float64
dtype: object
AllDATA -----------
uid    object
iid    object
rat    object
dtype: object

P.S. I know how to use low_memory=False and that just supresses the warning.

The Possible Cause

with open('AllData.csv', 'w') as handle:
    udata_df.to_csv(handle, index=False)
    pred_matrix.to_csv(handle, index=False)

Since, I need to write 2 CSVs into Single DF handle object is used and probably that turns all the values into its type. Can anything preserve the data type applying the same logic?

Unhelpful References taken so far:

  1. This one
  2. This two
  3. This too!
T3J45
  • 717
  • 3
  • 12
  • 32
  • Here's something I'll like to `add AD_Matrix = AllData.drop_duplicates(subset=['uid','iid'])` so I guess that removes the header. Here's the o/p `sys:1: DtypeWarning: Columns (0,1,2) have mixed types. Specify dtype option on import or set low_memory=False. UDATA ------------- uid int64 iid int64 rat int64 dtype: object PRED_MATRIX ------- uid int64 iid int64 rat float64 dtype: object AllDATA ----------- 196 object 242 object 3 object dtype: object` @jezrael – T3J45 Aug 10 '17 at 08:28
  • No, `AllData.drop_duplicates(subset=['uid','iid'])` dont remove header, only duplicates. – jezrael Aug 10 '17 at 08:29

2 Answers2

1

There is problem your header in second DataFrame is written too, so need parametr header=False:

with open('AllData.csv', 'w') as handle:
    udata_df.to_csv(handle, index=False)
    pred_matrix.to_csv(handle, index=False, header=False)

Another solution is mode=a for append second DataFrame:

f = 'AllData.csv'
udata_df.to_csv(f, index=False)
pred_matrix.to_csv(f,header=False, index=False, mode='a')

Or use concat:

f = 'AllData.csv'
pd.concat([udata_df, pred_matrix]).to_csv(f, index=False)

Sample:

udata_df = pd.DataFrame({'uid':[1,2],
                         'iid':[8,9],
                         'rat':[0,3]})

pred_matrix = udata_df * 10

Third row is header:

with open('AllData.csv', 'w') as handle:
    udata_df.to_csv(handle, index=False)
    pred_matrix.to_csv(handle, index=False)

f = 'AllData.csv'
df = pd.read_csv(f)
print (df)
   iid  rat  uid
0    8    0    1
1    9    3    2
2  iid  rat  uid
3   80    0   10
4   90   30   20

After parameter header=False it working correctly:

with open('AllData.csv', 'w') as handle:
    udata_df.to_csv(handle, index=False)
    pred_matrix.to_csv(handle, index=False, header=False)

f = 'AllData.csv'
df = pd.read_csv(f)
print (df)
   iid  rat  uid
0    8    0    1
1    9    3    2
2   80    0   10
3   90   30   20

mode append solution:

f = 'AllData.csv'
udata_df.to_csv(f, index=False)
pred_matrix.to_csv(f,header=False, index=False, mode='a')
df = pd.read_csv(f)
print (df)
   iid  rat  uid
0    8    0    1
1    9    3    2
2   80    0   10
3   90   30   20

concat solution:

f = 'AllData.csv'
pd.concat([udata_df, pred_matrix]).to_csv(f, index=False)
df = pd.read_csv(f)
print (df)
   iid  rat  uid
0    8    0    1
1    9    3    2
2   80    0   10
3   90   30   20
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

with open method is unnecessary in your case as you can simply concatenate two matrixes and then save it to csv only using pandas like below:

df = pd.concat([udata_df, pred_matrix], axis=1) df.to_csv(encoding='utf-8')

Jape
  • 11
  • 5