0

I tried many answers but none of them working for me:

For example this: Import multiple CSV files into pandas and concatenate into one DataFrame

import pandas as pd
import glob
import os

path = r'C:\DRO\DCL_rawdata_files' # use your path
all_files = glob.glob(os.path.join(path , "/*.csv"))

li = []

for filename in all_files:
    df = pd.read_csv(filename, index_col=None, header=0)
    li.append(df)

frame = pd.concat(li, axis=0, ignore_index=True)

I have only 2 csv files:

1.csv:

1,1
2,1
3,1
4,1
5,1

2.csv:

6,1
7,1
8,1
9,1

To be fair, this is my routine for merging:

files = glob.glob("data/*.csv")
df = []
for f in files:
    csv = pd.read_csv(f, index_col=None, header=0)
    df.append(csv)
df = pd.concat(df, axis=0, ignore_index=True)
df.to_csv("all.csv")
print(df);

This is the output (print(df)):

   1  1.1    6
0  2  1.0  NaN
1  3  1.0  NaN
2  4  1.0  NaN
3  5  1.0  NaN
4  1  NaN  7.0
5  1  NaN  8.0
6  1  NaN  9.0

And this is the "all.csv":

,1,1.1,6
0,2,1.0,
1,3,1.0,
2,4,1.0,
3,5,1.0,
4,1,,7.0
5,1,,8.0
6,1,,9.0

Whereas I would need all.csv to be:

1,1
2,1
3,1
4,1
5,1
6,1
7,1
8,1
9,1

I'm using Python3.9 with PyCharm 2022.3.1.

Why is my all.csv look like that, and how can I simply read multiple csv into one dataframe for further processing?

Daniel
  • 2,318
  • 2
  • 22
  • 53
  • You should read csv with `header=None` option. Also, if your target is the final `all.csv`, you'd be better off reading the csv as text files and concatenate them. – Quang Hoang Jan 31 '23 at 16:16
  • My target is DataFrame, just I wanted to write it out, for testing the result. – Daniel Jan 31 '23 at 16:19
  • and I have "header=0", isn't it enough? – Daniel Jan 31 '23 at 16:19
  • 1
    No, `header=0` means to take the first row as column names, and the result is what you see. `header=None` means no column names. It should give you the expected result. – Quang Hoang Jan 31 '23 at 16:20

3 Answers3

0

You have to swap header and index_col values:

data = []
for filename in all_files:
    df = pd.read_csv(filename, index_col=0, header=None)
    data.append(df)
df = pd.concat(data, axis=0)
df.to_csv('all.csv', header=False)

Output all.csv:

1,1
2,1
3,1
4,1
5,1
6,1
7,1
8,1
9,1
Corralien
  • 109,409
  • 8
  • 28
  • 52
  • For me if I don't add `index=None` into "to_csv", I got an extra line at the top with "0 1". – Daniel Jan 31 '23 at 16:24
  • This is the header, you have to use `header=False` to avoid this leading row. – Corralien Jan 31 '23 at 16:27
  • So I really should go with `pd.read_csv(f, index_col=None, header=None)` AND `df.to_csv('all.csv', header=False, index=None)` – Daniel Jan 31 '23 at 16:27
  • `header=False` for me eliminates the row headers, not the column headers – Daniel Jan 31 '23 at 16:28
  • Are you sure you are not confusing the two concepts: index (= the number of rows) and columns (= the number of columns)? – Corralien Jan 31 '23 at 16:33
0

I have been successfully using this code to append several Excel files into one dataframe; it also forces certain columns to be string datatypes, which can be changed, of course:

fmask = 'C:/Users/USER/FILEPATH/*.xlsx'

# lst of column names which needs to be string
lst_str_cols = ['ItemID', 'StoreID']
# use dictionary comprehension to make dict of dtypes
dict_dtypes = {x : 'str'  for x in lst_str_cols}

dfs = []
for f in glob.glob(fmask):
    df = pd.read_excel(f, dtype=dict_dtypes)
    dfs.append(df)

# concatenated
dfConcatenate = pd.concat(dfs, ignore_index=True)

Instead of Excel you can change it to CSV.

Dolunaykiz
  • 323
  • 1
  • 8
0
import pandas as pd

df1 = pd.read_csv('csvfile1.csv', header=None)
df1 = pd.read_csv('csvfile2.csv', header=None)

df = pd.concat([df1, df2], ignore_index=True)

for index, row in df.iterrows():
    print(row[0], row[1])
111
  • 42
  • 1
  • 9