Pandas read.csv creates tuple

Question

Edit: A better loop at the bottom too. p is my abbreviation for pandas.

I am attempting to bring in a number of spectra, available as .csvs (without headers), merge them, and drop some columns. These spectra are initially available in a two column format:

col1  col2                       col1-1   col1-2   col2-1   col2-2 ... colm-1   colm-2
X1    Y1  X M spectra to make    X1-1     Y1-1      X2-1     Y2-1  ...  Xm-1    Ym-1
...   ...                                            ...
Xn    Yn                         X1-n     Y1-n      X2-n     Y2-n  ...  Xm-n    Ym-n

Where all Col1 are redundant. By dropping either all Col1 or all but the 1st Col1 the spectra are ready to be used in a few different tools I've built.

The problem lies in my intake loop:

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

mergespec=p.DataFrame()
for f in all_filenames:
    file = p.read_csv(f, header=None, names=["WVNB", "Int"]),
    filemerge = p.merge(file, mergespec, on="WVNB", how='outer')

The object 'file' is returned as a tuple. The code fails on the merge step, which is rejected because "Can only merge Series or DataFrame objects, a class 'tuple' was passed".

I can confirm that 'all_filenames' is populated correctly, and that read_csv works fine outside of this loop. In fact, I have a similar loop in another notebook that concatenates the spectra fine (which I use in figure generation).

Dear Readers I am very new to Python and my code if 90% script kiddy robbery. Please help me understand why my p.read_csv is returning a tuple, or how I biffed this loop. Thanks!

Edit: Malwaisen and Vishwas both answered the question on the tuple generation, which did work. It opened other problems with matching column names. However, after reading Pandas Merging 101 I found a more parsimonious loop:

filemerge = p.concat([p.read_csv(f, header = None, names=["cm^-1", f]) for f in all_filenames], axis=1)

Which replaces everything from mergespec=p.Dataframe() onward. This met my objective with fewer lines. Also, since f was a column name and a filename, I used df.columns.str.rstrip('.0.csv') and lstrip to leave only the relevant sample ID from the filename as the column names.

can you try make change to this code and see if it works. file = p.read_csv(list (f), header=None, names=["WVNB", "Int"]) — Vishwas, Jan 06 '20 at 21:48

malvoisen · Answer 1 · 2020-01-09T09:46:27.417

1

You have a comma at the end of file=.... Python sees this as a tuple, where file[0] is your file and you could have file[1] and so on with additional commas.

Just remove that comma and you are good to go. Or if the comma is close to your heart use file[0] in your merge statement.

edited Jan 09 '20 at 09:46

answered Jan 06 '20 at 22:02

malvoisen

158
7

score 0 · Answer 2 · answered Jan 06 '20 at 21:49

Try this snippet and see if it works.

extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]

mergespec=p.DataFrame()
for f in all_filenames:
    file = p.read_csv(list (f), header=None, names=["WVNB", "Int"]),
    filemerge = p.merge(file, mergespec, on="WVNB", how='outer')

Pandas read.csv creates tuple

2 Answers2