1

Edit: A better loop at the bottom too. p is my abbreviation for pandas.

I am attempting to bring in a number of spectra, available as .csvs (without headers), merge them, and drop some columns. These spectra are initially available in a two column format:

col1  col2                       col1-1   col1-2   col2-1   col2-2 ... colm-1   colm-2
X1    Y1  X M spectra to make    X1-1     Y1-1      X2-1     Y2-1  ...  Xm-1    Ym-1
...   ...                                            ...
Xn    Yn                         X1-n     Y1-n      X2-n     Y2-n  ...  Xm-n    Ym-n

Where all Col1 are redundant. By dropping either all Col1 or all but the 1st Col1 the spectra are ready to be used in a few different tools I've built.

The problem lies in my intake loop:

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

mergespec=p.DataFrame()
for f in all_filenames:
    file = p.read_csv(f, header=None, names=["WVNB", "Int"]),
    filemerge = p.merge(file, mergespec, on="WVNB", how='outer')

The object 'file' is returned as a tuple. The code fails on the merge step, which is rejected because "Can only merge Series or DataFrame objects, a class 'tuple' was passed".

I can confirm that 'all_filenames' is populated correctly, and that read_csv works fine outside of this loop. In fact, I have a similar loop in another notebook that concatenates the spectra fine (which I use in figure generation).

Dear Readers I am very new to Python and my code if 90% script kiddy robbery. Please help me understand why my p.read_csv is returning a tuple, or how I biffed this loop. Thanks!

Edit: Malwaisen and Vishwas both answered the question on the tuple generation, which did work. It opened other problems with matching column names. However, after reading Pandas Merging 101 I found a more parsimonious loop:

filemerge = p.concat([p.read_csv(f, header = None, names=["cm^-1", f]) for f in all_filenames], axis=1)

Which replaces everything from mergespec=p.Dataframe() onward. This met my objective with fewer lines. Also, since f was a column name and a filename, I used df.columns.str.rstrip('.0.csv') and lstrip to leave only the relevant sample ID from the filename as the column names.

CPO
  • 23
  • 5

2 Answers2

1

You have a comma at the end of file=.... Python sees this as a tuple, where file[0] is your file and you could have file[1] and so on with additional commas.

Just remove that comma and you are good to go. Or if the comma is close to your heart use file[0] in your merge statement.

malvoisen
  • 158
  • 7
0

Try this snippet and see if it works.

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

mergespec=p.DataFrame()
for f in all_filenames:
    file = p.read_csv(list (f), header=None, names=["WVNB", "Int"]),
    filemerge = p.merge(file, mergespec, on="WVNB", how='outer')
Vishwas
  • 343
  • 2
  • 13