0

Hello guys I have a program where i have a multiple csv file and i want to append that csv files Its a simple ex what i have and what i want..

File1.csv:

A  B  C  D
1  2  3  4
2  3  4  5

File2.csv:

A  B  C  D
8  8  8  8
9  9  9  9

outputFile.csv:
A  B  C  D
1  2  3  4
2  3  4  5
8  8  8  8
9  9  9  9

This is the reqiured output for getting this i have written a code which works fine ..

file1 = "File1.csv"
df1= pd.read_csv(file1)
file2 = "File2.csv"
df2= pd.read_csv(file2)

results = df1.append(df2)
results.to_csv("outputFile.csv", index=False)

This works fine but now i'm getting the Input file from UI where i'm getting the files in List so for that i have written a code but its not working

datafiles = ["File1.csv","File2.csv"]
dataframes=[]
# df = pd.DataFrame()
for files in datafiles:
    df1= pd.read_csv(files)
    dataframes.append(df1)

    dataframes.to_csv("mergeOutput.csv", index=False)

I don't want to read all files separately that why i have used the for loops and store all the data to the dataframes but its not correct way i guess please suggest me the correct way how to do it and i also want to remove the duplicates from file please let me know if anyhting is not clear...thanks in advance.

As suggest @Thotsaphon Sirikutta Import multiple csv files into pandas and concatenate into one DataFrame now i'm able to get the output file as i need but i'm getting everytime getting 3 or 4 extra columns named as "Unnamed" which is empty so please tell me why i'm getting extra columns how to remove it without using drop() this is code

datafiles = ["File1.csv","File2.csv"]
dfs=[]

for filename in datafiles:
    dfs.append(pd.read_csv(filename))

mergeData = pd.concat(dfs,sort=False)
mergeData.to_csv("mergeOutput.csv", index=False)
snehil singh
  • 554
  • 1
  • 5
  • 18
  • 1
    look at this topic https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe – Thotsaphon Sirikutta Apr 30 '19 at 04:41
  • 1
    I guess you need a for loop anyway to read various files.. only problemI see is `dataframes.to_csv()` should be out of for loop..... – hacker315 Apr 30 '19 at 04:43
  • @hacker315 ya correct if i'm reading the files separately and store in dataframe then it will work but i don't have only 2 or 3 file it might be it is 10 files also thats why i'm tying to do this way – snehil singh Apr 30 '19 at 04:46
  • @ThotsaphonSirikutta its working but i'm getting some additional columns named as unambed which empty – snehil singh Apr 30 '19 at 04:52
  • First, create an empty data frame with the column names, I hope you will have fixed column names for the given dataset. like df = pd.DataFrame(columns=COLUMN_NAMES). and then read and append all dataframes in the loop. Note: in your code instead of appending data to dataframe you are appending it to a list which is incorrect. – Kishore Kolla Apr 30 '19 at 05:00
  • @KishoreKolla i have tried that way but its not working by creating the empty datadrame – snehil singh Apr 30 '19 at 05:03
  • @KishoreKolla bro i'm able to get it now by using this https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe but i'm getting some extra columns can you help me to remove that why i'm getting that extra empty columns – snehil singh Apr 30 '19 at 05:05

1 Answers1

1

Well, if you have multiple csv files with the same columns, you can do something like this:

import pandas as pd

opened = []

for file in arrayFile:
## you must puth header on 0 and index_col as none so you wont damage the 
#indexed later
  df = pd.read_csv(file, index_col= None, header = 0)
  opened.append(df)

frame = pd.concat(opened, axis = 0, ignore_index = True)

UPDATE

If you are having a problem with the data, maybe it's something about the structure that you must preprocess first. Look at this example i just make on my computer.

enter image description here

Kenry Sanchez
  • 1,703
  • 2
  • 18
  • 24
  • why i'm getting extra columns can you please tell ? I have already tried this code its working fine but i'm getting extra columns which is empty so please tell why its creating unamed columns – snehil singh Apr 30 '19 at 05:18
  • Are you getting extra columns?? it shouldn't be. I mean, if the columns are the same in all the files, it should not put another column. Otherwise, something is wrong. Are you sure all the files are the same?? – Kenry Sanchez Apr 30 '19 at 05:23
  • if the files are not the same, i think you could do this `frame = pd.concat(opened[["my differents columns"]], axis = 0, ignore_index = True)` – Kenry Sanchez Apr 30 '19 at 05:24
  • i'm sure its just copy file so i don't think there is any difference – snehil singh Apr 30 '19 at 05:25
  • Well, in the code you calling the whole frame. If you put just your columns "column 1", "column 2"... you are just appending the columns that you are interested in. Try it and tell me if I'm wrong. – Kenry Sanchez Apr 30 '19 at 05:27
  • i'm getting this error list indices must be integers or slices, not list – snehil singh Apr 30 '19 at 05:27
  • @ Kenry Sanchez okk got it – snehil singh Apr 30 '19 at 05:28
  • ok. let me try it on my computer and I should put the answer to this question. I hope to find the answer and help you with the problem – Kenry Sanchez Apr 30 '19 at 05:29
  • @ Kenry Sanchez i have tried with differnt file which is just a copy of first file but its still creating 3 columns in which 2 is unamed and one is column name which is already have but all are empty – snehil singh Apr 30 '19 at 05:34