Recursive pd.merge() output error

Question

I want to be able to take a collection of CSV files who share a common index and time t with each other and I want to merge them all together using one function called mergedf(). It looked to me like it worked except it printed the same set of values 3 times. It seems as though it is printing filepath[0] 3 times based off of my if statement. In addition, it could also be intdf in the prepdf() function.

If you could help me spot my error that would be amazing.

In:

def prepdf(path, mi, ma):
    csv = pd.read_csv(path, usecols=[0,1], skiprows=1, names = ['t','b'])
    df = DataFrame(csv)

    fs = 2  
    T = 1/fs  
    ts = np.arange(mi, ma, T)

    interpdata = {}

    for key in ['b']:
        spl = interpolate.interp1d(df['t'], df[key])
        interpdata[key] = spl(ts)

    interpframe = pd.DataFrame(interpdata, index=ts)
    interpframe.index.name = 'ts'
    interpframe.reset_index(inplace=True)
    interpframe['t'] = interpframe['ts']
    temp = interpframe.loc[interpframe['b'] > 0.5, 't']
    interpframe.loc[interpframe['b'] > 0.5, 't'] = temp
    interpframe['t'] = interpframe['t'].fillna(method='ffill')
    interpframe.set_index('t', inplace=True)
    inttmp = interp_frame
    intdf = interp_frame.head(n=len(inttmp))

    return intdf   

PATHS = ['data1.csv', 'data2.csv', 'data3.csv']
filepath = [file for file in PATHS]

for path in PATHS:
    df = prepdf(path, 650, 1000)
    print(df)

print(len(PATHS))

def mergedf(n):
    if len(PATHS)-1-n == 0:
        return prepdf(filepath[0], 650, 1000)
    else:
        return pd.merge(prepdf(filepath[len(PATHS)-1-n], 650, 1000), mergedf(n+1), left_on='t', right_on='t')

mergedf(0)

Out(mergedf(0)):

    t       b           b_x         b_y
0   650.0   0.105299    0.105299    0.105299
1   650.5   0.193072    0.193072    0.193072
2   651.0   0.115404    0.115404    0.115404
3   651.5   0.047509    0.047509    0.047509
4   652.0   0.119501    0.119501    0.119501
5   652.5   -0.187888   -0.187888   -0.187888
...     ...     ...     ...     ...
695     997.5   0.165262    0.165262    0.165262
696     998.0   -0.131729   -0.131729   -0.131729
697     998.5   0.038266    0.038266    0.038266
698     999.0   0.093568    0.093568    0.093568
699     999.5   0.022013    0.022013    0.022013

700 rows × 4 columns

Here is an example of a CSV DataFrame:

     t         b
0    650.0  0.105299
1    650.5  0.193072
2    651.0  0.115404
3    651.5  0.047509
4    652.0  0.119501
5    652.5 -0.187888
     ...    ...

Just wondering... do you mean to "merge" or "concatenate"? Because merge is a horizontal operation... — cs95, Jul 16 '17 at 08:16
@cᴏʟᴅsᴘᴇᴇᴅ Well the csvs are shaped m by 2 with a common index that I want them to "merge" on. So b, b_x, and b_y are supposed to be separate csvs made into dataframes — Julian Rachman, Jul 16 '17 at 08:20
I see. Have you taken a look at [this](https://stackoverflow.com/questions/38089010/merge-a-list-of-pandas-dataframes)? — cs95, Jul 16 '17 at 08:21
@cᴏʟᴅsᴘᴇᴇᴅ Yes I have although it was throwing me off because they deal with identical values when I don't have identical values in any of the dataframes. — Julian Rachman, Jul 16 '17 at 08:23
@JulianRachman, how do you want to merge your data sets if you `"don't have identical values in any of the dataframes"`? — MaxU - stand with Ukraine, Jul 16 '17 at 08:35

MaxU - stand with Ukraine · Accepted Answer · 2017-07-16T08:47:13.803

0

IIUC:

df = pd.concat([prepdf(x, 650, 1000) for x in PATHS], axis=1)

UPDATE:

i guess the problem of showing you the same data set three times was caused by the following lines:

intdf = interp_frame.head(n=len(inttmp))

return intdf

interp_frame - is not defined in the function. Most probably it was defined before in your Python environment (iPython, Jupyter, etc.)

edited Jul 16 '17 at 08:47

answered Jul 16 '17 at 08:37

MaxU - stand with Ukraine

205,989
36
386
419

I am still getting identical b-values in the columns following t. So therefore what you have written doesn't work. – Julian Rachman Jul 16 '17 at 08:38
@JulianRachman, i guess it's because you are using `interp_frame`, which is not defined in your function... Pay attention at the underscore character in the variable name – MaxU - stand with Ukraine Jul 16 '17 at 08:40
Alright we are good! Thank you. I did not think I would need to write out such a long question for such a small syntactical error. – Julian Rachman Jul 16 '17 at 08:43

Recursive pd.merge() output error

1 Answers1