2

I'm a beginner in python, and I need your help in a issue, I have 126 files that contains more than 12 columns and more than 1000 lines, I want to create a file which contains column 1 and 2 of all files.

so for example if I have the file 1 which contains 5 columns from A to E

A       B     C     D    E
name1   2     13    98   6
name2   7     8     67   12
name3   56    67    9    7

and the file 2 which contains 5 columns from A to E

A       B     C     D    E
name1   3     13    98   6
name2   9     8     67   12
name3   12    67    9    7

I want to create a final file which contains column A and column B of each file

so the result will be

A       B     B   
name1   2     3 
name2   7     9  
name3   56    12    

Please tell me if you want to know any other informations or clarification Thank you very much

SAAD
  • 27
  • 6
  • 1
    Read up on basic pandas functions, such as `merge`: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.merge.html – Paul Dec 15 '21 at 14:28

1 Answers1

1

simply merge subsets of the two dataframes on 'A'

df[['A','B']].merge(df2[['A','B']], on=['A'])

The two similar-named columns (B) will have to be renamed, since you can't have two columns with the same name. Default is (“_x”, “_y”).

You can choose your own suffixes by adding suffixes parameter, for example:

df[['A','B']].merge(df2[['A','B']], on=['A'], suffixes=['', '2'])
Paul
  • 1,801
  • 1
  • 12
  • 18
  • Thank you, but what about the loop to iterate through the 126 files, because I have a folder that contains the 126 files. – SAAD Dec 15 '21 at 15:04
  • 1
    Maybe this helps: https://stackoverflow.com/questions/44327999/python-pandas-merge-multiple-dataframes – Paul Dec 15 '21 at 15:06
  • 1
    To import multiple files from a folder you can look at this answer: https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe – Paul Dec 15 '21 at 15:07