0

I want to add multiple columns in an existing csv. My data looks like this:

50451  51151  53266
 100    100    100
  1      1      1

where the data starting with (50...) are the columns and below them are the rows. I have another dataset which looks similar to this:

50014  50013  54567
 50     100    100

I am using this code to change it into csv:

df.to_csv('fort.csv', index = False)

but what it does is , it replaces the old columns with new ones. Since , I have to add multiple columns , I can't use df['50014'] everytime. If you guys could suggest something , I would greatly appreciate it.

vesuvius
  • 435
  • 4
  • 20
  • If the two files have the same order and amount of columns you can simply append one file to another as if it was a text file (which a csv is). You'd have to remove the duplicate column headers of course though. –  May 16 '19 at 12:25
  • The amount of columns are not similar. – vesuvius May 16 '19 at 12:26

1 Answers1

3

You can use merge suffixes, to achieve this. In case there are columns with the same names the suffixes will fix the problem.

suffixes : tuple of (str, str), default (‘_x’, ‘_y’)

Suffix to apply to overlapping column names in the left and right side, respectively. To raise an exception on overlapping columns use (False, False).


r = pd.merge(df, df, right_index=True, left_index=True, suffixes=('_left', '_right'), how='outer')
print(r)

For simplicity, I took the same df again:

   50451  51151  53266
0    100    100    100
1      1      1      1
   50451_left  51151_left  53266_left  50451_right  51151_right  53266_right
0         100         100         100          100          100          100
1           1           1           1            1            1            1

Important is to use the outer join.


Merge is same as join except join may be handy because it is by default used on indices (indexes). They share the same code base ;).

Community
  • 1
  • 1
prosti
  • 42,291
  • 14
  • 186
  • 151
  • All of the columns start with '5' , so can I use preffix instead of suffix? – vesuvius May 16 '19 at 12:28
  • Columns may start with `5` but whether the two columns are the same or not depends on remaining ciphers, right. I would take suffixes as this is provided by default. – prosti May 16 '19 at 12:30
  • IF I use merge @prosti , I have to mention the second dataframe everytime to merge it with the first one like df1 = pd.merge(df2) , which is not feasible for me as I have around 10,000 dataframes. – vesuvius May 16 '19 at 12:37
  • I tried it , but it is replacing the data present in my previous csv. I have created a csv with 10,000 dataframes but now I want to add 100 more into it. I don't think I'll be able to do it with this code – vesuvius May 16 '19 at 12:46
  • This looks like already answered question: https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe – prosti May 16 '19 at 12:49
  • Ok. I'll try it and get back to you. Thanks :) – vesuvius May 16 '19 at 12:53
  • Sorry @prosti but the answer doesn't work for me as I already have large amount of data in the csv and I want to add more data which is in same format. – vesuvius May 16 '19 at 13:23
  • NO worries, ask another question specifying the right and important details in the title. – prosti May 16 '19 at 13:34