1

I have 200 files of different which I need to concat column after column into one file. These 200 files are in one directory, and so I tried the following script.

path = '/data' 
files = os.listdir(path)

files_txt  = [os.path.join(path,i) for i in files if i.endswith('tsv')]

## Change it into dataframe
dfs = [pd.DataFrame.from_csv(x, sep='\t')[[6]] for x in files_txt]
##Concatenate it
merged = pd.concat(dfs, axis=1)

But it throws the follwoing value error as the shape is different for each of these files. I would to have some solution. Thank you

Here is error,

ValueError: Shape of passed values is (149, 13864), indices imply (149, 13860)
ARJ
  • 2,021
  • 4
  • 27
  • 52
  • Maybe this can help: https://stackoverflow.com/questions/27719407/pandas-concat-valueerror-shape-of-passed-values-is-blah-indices-imply-blah2 – Shaido Aug 14 '18 at 01:46
  • No I cannot reset index on a list dfs is list, read as dataframe – ARJ Aug 14 '18 at 01:47
  • You could try do it after `from_csv`, but not sure it will help. Does the files have the same number of rows or can it differ? If diffferent you can pass `ignore_index=True` to `concat`. – Shaido Aug 14 '18 at 01:49
  • ignore_index throws the same error message – ARJ Aug 14 '18 at 01:55
  • why are you using `DataFrame.from_csv`? That API was deprecated years ago. – cs95 Aug 14 '18 at 02:05

1 Answers1

1

The index contain the duplicates , then concat will failed , since it will base on the index to join the dataframe

dfs = [pd.DataFrame.from_csv(x, sep='\t')[[6]].reset_index(drop=True) for x in files_txt]
##Concatenate it
merged = pd.concat(dfs, axis=1)

Using the check

for x in dfs : 
    print(x.index.is_unique)

In order to reproduced the error

df1=pd.DataFrame({'A':[1,2]})
df2=pd.DataFrame({'A':[1,2]},index=[1,1])
pd.concat([df1,df2],axis=1)

ValueError: Shape of passed values is (2, 5), indices imply (2, 3)

BENY
  • 317,841
  • 20
  • 164
  • 234