1

I want to be able to take a collection of CSV files who share a common index and time t with each other and I want to merge them all together using one function called mergedf(). It looked to me like it worked except it printed the same set of values 3 times. It seems as though it is printing filepath[0] 3 times based off of my if statement. In addition, it could also be intdf in the prepdf() function.

If you could help me spot my error that would be amazing.

In:

def prepdf(path, mi, ma):
    csv = pd.read_csv(path, usecols=[0,1], skiprows=1, names = ['t','b'])
    df = DataFrame(csv)

    fs = 2  
    T = 1/fs  
    ts = np.arange(mi, ma, T)

    interpdata = {}

    for key in ['b']:
        spl = interpolate.interp1d(df['t'], df[key])
        interpdata[key] = spl(ts)

    interpframe = pd.DataFrame(interpdata, index=ts)
    interpframe.index.name = 'ts'
    interpframe.reset_index(inplace=True)
    interpframe['t'] = interpframe['ts']
    temp = interpframe.loc[interpframe['b'] > 0.5, 't']
    interpframe.loc[interpframe['b'] > 0.5, 't'] = temp
    interpframe['t'] = interpframe['t'].fillna(method='ffill')
    interpframe.set_index('t', inplace=True)
    inttmp = interp_frame
    intdf = interp_frame.head(n=len(inttmp))

    return intdf   

PATHS = ['data1.csv', 'data2.csv', 'data3.csv']
filepath = [file for file in PATHS]

for path in PATHS:
    df = prepdf(path, 650, 1000)
    print(df)

print(len(PATHS))

def mergedf(n):
    if len(PATHS)-1-n == 0:
        return prepdf(filepath[0], 650, 1000)
    else:
        return pd.merge(prepdf(filepath[len(PATHS)-1-n], 650, 1000), mergedf(n+1), left_on='t', right_on='t')

mergedf(0)

Out(mergedf(0)):

    t       b           b_x         b_y
0   650.0   0.105299    0.105299    0.105299
1   650.5   0.193072    0.193072    0.193072
2   651.0   0.115404    0.115404    0.115404
3   651.5   0.047509    0.047509    0.047509
4   652.0   0.119501    0.119501    0.119501
5   652.5   -0.187888   -0.187888   -0.187888
...     ...     ...     ...     ...
695     997.5   0.165262    0.165262    0.165262
696     998.0   -0.131729   -0.131729   -0.131729
697     998.5   0.038266    0.038266    0.038266
698     999.0   0.093568    0.093568    0.093568
699     999.5   0.022013    0.022013    0.022013

700 rows × 4 columns

Here is an example of a CSV DataFrame:

     t         b
0    650.0  0.105299
1    650.5  0.193072
2    651.0  0.115404
3    651.5  0.047509
4    652.0  0.119501
5    652.5 -0.187888
     ...    ...
Julian Rachman
  • 575
  • 3
  • 15

1 Answers1

0

IIUC:

df = pd.concat([prepdf(x, 650, 1000) for x in PATHS], axis=1)

UPDATE:

i guess the problem of showing you the same data set three times was caused by the following lines:

intdf = interp_frame.head(n=len(inttmp))

return intdf   

interp_frame - is not defined in the function. Most probably it was defined before in your Python environment (iPython, Jupyter, etc.)

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419