0

I try to create the DataFrame using method to csv.in place of the path I want to give regex pattern so that all file with this pattern gets. But this I don't get the file as per my expectation.

Please help me to solve the problem.

import pandas as pd

df=pd.to_csv(path+"^\d{8}_\d{6}$",sep="|",Header=none,names=col)

But this line does not fetch the exact file pattern. directly this regular expression comes for search, please help me solve this.

nitin3685
  • 825
  • 1
  • 9
  • 20
Sudhakar
  • 57
  • 1
  • 1
  • 8
  • Please provide a code sample, including the error traceback, and improve the syntax of your post. – Arn Nov 14 '19 at 19:11
  • 1
    Please [edit] your question to include a [mcve] including sample input sample output, and _code for what you have tried_ so far – G. Anderson Nov 14 '19 at 19:13
  • i already edit the question please check – Sudhakar Nov 15 '19 at 04:54
  • Let me see if i can understand your question correctly. You want to read a set of files under `path` that match a specific pattern and create a single dataframe using those files? Please confirm this is what you are looking for? Check whether this link helps : https://stackoverflow.com/questions/20906474/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe – nitin3685 Nov 15 '19 at 04:59
  • yes @nitin3685, in path both directory and common part of the file. – Sudhakar Nov 15 '19 at 05:12

1 Answers1

1

The solution have 2 steps. The first step is you have to find all path that match a specific pattern. The second one is you read data from each DataFrame and concat it after that. The pandas library do not support the 1 step (I think, need recheck soon). So you could use glob library for that.

Code sample:

import pandas as pd
import glob

root_path = './'
datasheet_path_pattern = root_path + ('[0-9]' * 8) + '_' + ('[0-9]' * 6)
datasheet_paths = [path for path in glob.iglob(datasheet_path_pattern)]
datasheet = []
for datasheet_path in datasheet_paths:
  df = pd.read_csv(datasheet_path, sep="|", Header=none, names=col)
  datasheet.append(df)

datasheet = pd.concat(datasheet)
Trần Đức Tâm
  • 4,037
  • 3
  • 30
  • 58