1

I am trying to combine 3 lists, each of size 10 rows x 25 columns, into one list of size 10 x 75, and then write the result to a csv file. But all my attempts at using straightforward approaches fail, and I cannot figure out why - and how to fix it.

The lists are produced by one and the same function. The column heads of the lists are identical, except for a number that has been added to the column name to distinguish them, as follows.

def do_stuff(x):
    producing 10 x 25 list

# Producing list 1 from document 1
df = pd.read_csv("file1.csv", encoding = "utf-8") # open file 1
df1 = do_stuff(df)

# Producing list 2 from document 2
df = pd.read_csv("file2.csv", encoding = "utf-8") # open file 2
df2 = do_stuff(df)

# Producing list 3 from document 3
df = pd.read_csv("file3.csv", encoding = "utf-8") # open file 3
df3 = do_stuff(df)

The data in each of the lists df1, df2, df3 looks something like this:

    # List df1                      # List df2
    [col_A1  Col_B1  Col_C1 etc     [col_A2  Col_B2  Col_C2 ...
 0  0.96     0.13    0.63.       0  0.56     0.22    0.31
 1  0.21     0.33    0.44.       1  0.37     0.92    0.77
 ..                              ..
 9  0.11     0.12    0.99]       9  0.34     0.54    0.42] 

Attempt 1 To combine these lists into one DataFrame, I have tried:

 df1 + df2 + df3 

However, this leads to NaN's across the dataset:

    # List df1                      # List df2
    [col_A1  Col_B1  Col_C1 etc     [col_A2  Col_B2  Col_C2 ...
 0  NaN      NaN     NaN         0  NaN      NaN     NaN
 1  NaN      NaN     NaN         1  NaN      NaN     NaN
 ..                              ..
 9  NaN      NaN     NaN]        9  NaN      NaN     NaN]

Attempt 2 I have also tried the following, but get an error:

df_total.append(df1) # add result to list
df_total.append(df2) # add result to list
df_total.append(df3) # add result to list
df_total.to_csv("result.csv", index=False)

AttributeError: 'list' object has no attribute 'to_csv'

Attempt 3 I have tried to make sure to get a DataFrame which can be written to a csv:

df_total.append(df1) # add result to list
df_total.append(df2) # add result to list
df_total.append(df3) # add result to list
df = pd.DataFrame([df_total])
df = pd.concat(df)

But this gives NaN values across the dataset:

    col_A1   col_B1  col_C1 ... col_A2  col_B2  col_C2 ...
 0  0.96     NaN     0.63   ... 0.56    NaN     NaN
 1  NaN      0.33    NaN    ... NaN     0.92    NaN
 ..                         ...     
 9  0.11     NaN     NaN    ... 0.34    0.54    0.42] 

Question: Can anyone explain what is going on? And how to get the three lists df1, df2, df3 into one DataFrame that looks like below and can be written to a csv?

    col_A1   col_B1  col_C1 ... col_A2  col_B2  col_C2 ...
 0  0.96     0.13    0.63   ... 0.56    0.22    0.31
 1  0.21     0.33    0.44   ... 0.37    0.92    0.77
 ..                         ...     
 9  0.11     0.12    0.99   ... 0.34    0.54    0.42] 
jpp
  • 159,742
  • 34
  • 281
  • 339
twhale
  • 725
  • 2
  • 9
  • 25
  • What are train0 / train_df / train1 ? – jpp Apr 12 '18 at 13:42
  • 1
    Sorry, this is my mistake! This should be df1 + df2 + df3. I have corrected this – twhale Apr 12 '18 at 13:43
  • see https://pandas.pydata.org/pandas-docs/stable/merging.html first segement for concatenating dataframes. – PeterE Apr 12 '18 at 13:45
  • 2
    Have you tried `res = pd.concat([df1, df2, df3], axis=1)` ? – jpp Apr 12 '18 at 13:46
  • Yes that works...! Thank you so much. – twhale Apr 12 '18 at 13:50
  • @jpp Can you explain what went wrong in my attempts? Especially why the NaNs showed up? Very puzzling. – twhale Apr 12 '18 at 13:51
  • 1
    @twhale - Try check [this perfect answer](https://stackoverflow.com/q/49620538/2901002) – jezrael Apr 12 '18 at 13:54
  • My question is different, but the answer from that post contains an answer that works for my question also. So I have indicated that. Thanks a lot. But I still do not understand why the solutions I tried especially the third one did not work. – twhale Apr 12 '18 at 14:01

0 Answers0