2

I have tried to work on several csv files using glob, for example:

import glob 
import pandas as pd
import numpy as np
import csv

# Read all csv files with same file name in the folder
filenames = sorted(glob.glob('./16_2018-02*.csv'))

for f in filenames:
    df = pd.read_csv(f, names=['Date','RSSI','Data','Code'], 
    index_col=None)

    # Slicing information 
    df["ID"] = df["Data"].str.slice(0,2)
    df["X"] = df["Data"].str.slice(2,4)

    # Save the output data to csv with different name 
    df.to_csv(f'{f[:-4]}-train.csv', index=False)

In the end of the code, I used to save each dataframe into a new csv file with different name. Considering now I have so many csv data to work with, I want to concatenate them without first writing into each csv file. How should I do that?

Original dataset first 5 rows:

Date                            RSSI    Data                        Code        
2018-02-20T00:00:20.886+09:00   -99 1068ffd703d101ec77f425ea98b201  F2D5    
2018-02-20T00:00:21.904+09:00   -95 103cffbc032901ee77f49dea98b301  F2D5        
2018-02-20T00:00:22.415+09:00   -97 103cffbc032901ee77f49dea98b301  F2D5         
2018-02-20T00:00:46.580+09:00   -96 10fdfda803ff01f477f49dfd98cb03  F2D1        
2018-02-20T00:00:48.593+09:00   -96 101bfed3037401f577f49dfe98cd03  F2D6    

After:

Date                            RSSI    Data                        Code    ID  X   
2018-02-20T00:00:20.886+09:00   -99 1068ffd703d101ec77f425ea98b201  F2D5    16  0.065384    
2018-02-20T00:00:21.904+09:00   -95 103cffbc032901ee77f49dea98b301  F2D5    16  0.065340        
2018-02-20T00:00:22.415+09:00   -97 103cffbc032901ee77f49dea98b301  F2D5    16  0.065340         
2018-02-20T00:00:46.580+09:00   -96 10fdfda803ff01f477f49dfd98cb03  F2D1    16  0.065021    
2018-02-20T00:00:48.593+09:00   -96 101bfed3037401f577f49dfe98cd03  F2D6    16  0.065051    
npm
  • 643
  • 5
  • 17
  • Can you supply a couple sample data sets and what you would want the outcome to look like? – Ian Thompson Feb 25 '19 at 01:37
  • @IanThompson data sets contains many rows with 4 columns ('Date','RSSI','Data','Code'). In the process I added 2 new columns (ID, X) then all files are saved individually in csv files (e.g from 10 data sets in the folder, it wll be saved to 10 csv output files)... – npm Feb 25 '19 at 01:41
  • @IanThompson ..while now the outcome I would like to have is not several csv files, but one csv files with one header concatenated. – npm Feb 25 '19 at 01:42
  • Can you post what that would look like? For example, what do the first 5 rows of two of the `csv`s look like? Post them in code-formatted text so that we can copy them directly and can get you an answer faster. – Ian Thompson Feb 25 '19 at 01:45
  • @IanThompson added in the question. – npm Feb 25 '19 at 01:52

1 Answers1

1

Try the below code [for appending all the files to 1 file]:

filenames = sorted(glob.glob('./16_2018-02*.csv'))
appended_data=[] #create a list
for f in filenames:
    df = pd.read_csv(f, names=['Date','RSSI','Data','Code'], 
    index_col=None)

    # Slicing information 
    df["ID"] = df["Data"].str.slice(0,2)
    df["X"] = df["Data"].str.slice(2,4)
    appended_data.append(df) #append to the list
appended_data = pd.concat(appended_data, axis=1) #concat them together
#remove axis=1 if need to append vertically

The appended_data is now a dataframe with all files appended together post which you can export the same to csv/excel.

anky
  • 74,114
  • 11
  • 41
  • 70
  • I tried with no "axis=1" since I need it to be vertically appended, but got this error: ValueError: Plan shapes are not aligned. Trying to figure out why because the no. of columns should be the same. – npm Feb 25 '19 at 02:35
  • @npm check this: https://stackoverflow.com/questions/26226343/pandas-concat-gives-error-valueerror-plan-shapes-are-not-aligned – anky Feb 25 '19 at 02:38
  • 1
    Instead I tried with smaller dataset and your solution worked, thanks! I might just need to check on each dataset later. – npm Feb 25 '19 at 02:50