0

I have an excel file that has over 5000 rows and 32 columns. The data represents 16 tests. It is stored like this: The first column is the x values that belong to the y values that are stored in the second column. The third column is the x values that belong to the y values that are stored in the fourth column, and so on... Since the values can be subdivided into 4 tests (4 tests of 8 columns is 32 columns) I would like to store them as follows:

  • column 1 = xtest11
  • column 2 = ytest11
  • column 3 = xtest12
  • column 4 = ytest12
  • column 5 = xtest13
  • column 6 = ytest13
  • column 7 = xtest14
  • column 8 = ytest14
  • column 9 = xtest21
  • column 10 = ytest22
  • column 11 = xtest23
  • column 12 = ytest24
  • ...
  • column 31 = xtest44
  • column 32 = ytest44

Right now I have this all written out by using many lines of code. I would like to know if it is possible to use a for loop or something else to make this code look better. I am new to python (spyder) and I would like to learn this things to make it easier for me along the way. Thank you very much.

F0=-33.5
df = pd.read_excel (r'D:\GEO_ENGINEERING\Thesis\Resultaten instrong\datazuiver.xlsx') #place "r" before the path string to address special character, such as '\'. Don't forget to put the file name at the end of the path + '.xlsx'
    
    

#%% Test 1

T11=df["Time_1_1"]
F11=df["F_1_1"]-F0
T12=df["Time_1_2"]
F12=df["F_1_2"]-F0
T13=df["Time_1_3"]
F13=df["F_1_3"]-F0
T14=df["Time_1_4"]
F14=df["F_1_4"]-F0
T11=T11[1:5000]
T12=T12[1:5000]
T13=T13[1:5000]
T14=T14[1:5000]
F11=F11[1:5000]
F12=F12[1:5000]
F13=F13[1:5000]
F14=F14[1:5000]
plt.figure()
plt.plot(T11,F11, label='Proef 1 test 1')
plt.plot(T12,F12, label='Proef 1 test 2')
plt.plot(T13,F13, label='Proef 1 test 3')
plt.plot(T14,F14, label='Proef 1 test 4')
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title('Test 1')
plt.show()

#%%
df = pd.read_excel (r'D:\GEO_ENGINEERING\Thesis\Resultaten instrong\datazuiver.xlsx') #place "r" before the path string to address special character, such as '\'. Don't forget to put the file name at the end of the path + '.xlsx'
T21=df["Time_2_1"]
F21=df["F_2_1"]-F0
T22=df["Time_2_2"]
F22=df["F_2_2"]-F0
T23=df["Time_2_3"]
F23=df["F_2_3"]-F0
T24=df["Time_2_4"]
F24=df["F_2_4"]-F0
T21=T21[1:2366]
T22=T22[1:2366]
T23=T23[1:2366]
T24=T24[1:2366]
F21=F21[1:2366]
F22=F22[1:2366]
F23=F23[1:2366]
F24=F24[1:2366]
plt.figure()
plt.plot(T21,F21, label='Proef 2 test 1')
plt.plot(T22,F22, label='Proef 2 test 2')
plt.plot(T23,F23, label='Proef 2 test 3')
plt.plot(T24,F24, label='Proef 2 test 4')
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title('Test 2')
plt.show()
#%% Proef 3


T31=df["Time_3_1"]
F31=df["F_3_1"]-F0
T32=df["Time_3_2"]
F32=df["F_3_2"]-F0
T33=df["Time_3_3"]
F33=df["F_3_3"]-F0
T34=df["Time_3_4"]
F34=df["F_3_4"]-F0
T31=T31[1:2366]
T32=T32[1:2344]
T33=T33[1:2366]
T34=T34[1:2366]
F31=F31[1:2366]
F32=F32[1:2344]
F33=F33[1:2366]
F34=F34[1:2366]
plt.figure()
plt.plot(T31,F31, label='Proef 3 test 1')
plt.plot(T32,F32, label='Proef 3 test 2')
plt.plot(T33,F33, label='Proef 3 test 3')
plt.plot(T34,F34, label='Proef 3 test 4')
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title('Test 3 (no drillgrout)')
plt.legend()
plt.show()

#%% Proef 4

T41=df["Time_4_1"]
F41=df["F_4_1"]-F0
T42=df["Time_4_2"]
F42=df["F_4_2"]-F0
T43=df["Time_4_3"]
F43=df["F_4_3"]-F0
T44=df["Time_4_4"]
F44=df["F_4_4"]-F0
T41=T41[1:2366]
T42=T42[1:2366]
T43=T43[1:2366]
T44=T44[1:2366]
F41=F41[1:2366]
F42=F42[1:2366]
F43=F43[1:2366]
F44=F44[1:2366]
plt.figure()
plt.plot(T41,F41, label='Proef 4 test 1')
plt.plot(T42,F42, label='Proef 4 test 2')
plt.plot(T43,F43, label='Proef 4 test 3')
plt.plot(T44,F44, label='Proef 4 test 4')
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title('Test 4')
plt.legend()
plt.show()
#%% Plot all the tests
plt.figure()
plt.plot(T11,F11, label='Proef 1 test 1')
plt.plot(T12,F12, label='Proef 1 test 2')
plt.plot(T13,F13, label='Proef 1 test 3')
plt.plot(T14,F14, label='Proef 1 test 4')
plt.plot(T21,F21, label='Proef 2 test 1')
plt.plot(T22,F22, label='Proef 2 test 2')
plt.plot(T23,F23, label='Proef 2 test 3')
plt.plot(T24,F24, label='Proef 2 test 4')
plt.plot(T41,F41, label='Proef 4 test 1')
plt.plot(T42,F42, label='Proef 4 test 2')
plt.plot(T43,F43, label='Proef 4 test 3')
plt.plot(T44,F44, label='Proef 4 test 4')
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title('Test 1, 2, 3 and 4')
plt.legend()
plt.show()
#%% Plot of test 2 and 4mat
plt.figure()
plt.plot(T21,F21, label='Proef 2 test 1')
plt.plot(T22,F22, label='Proef 2 test 2')
plt.plot(T23,F23, label='Proef 2 test 3')
plt.plot(T24,F24, label='Proef 2 test 4')
plt.plot(T41,F41, label='Proef 4 test 1')
plt.plot(T42,F42, label='Proef 4 test 2')
plt.plot(T43,F43, label='Proef 4 test 3')
plt.plot(T44,F44, label='Proef 4 test 4')
plt.xlabel('Time [s]')
plt.ylabel('Force [N]')
plt.title('Test 2 and 4')
plt.legend()
plt.show()

1 Answers1

0

This is assuming that you only have one dataframe. It looks like you import the same one twice: once in the start and once in the middle.

If you want to try to reduce the number of lines, you could probably write a simple function that takes whatever "Proef" numbers, the title you want to have in the plot, and the dataframe, and plot those. For example,

import pandas as pd
import matplotlib.pyplot as plt

F0=-33.5
DATA_END_VALUES = [[5000]*4] + [[2366]*4] + [[2366, 2344]*2] + [[2366]*4]
# First index of DATA_END_VALUES is the proef number while the second is 
# the test number

def plot_results(df, proef_numbers, title):
    for proef  in proef_numbers:
        for test in range(1,5):
            time_data = df[f"Time_{proef}_{test}"][1:DATA_END_VALUES[proef-1][test-1]] 
            f_data = df[f"F_{proef}_{test}"][1:DATA_END_VALUES[proef-1][test-1]] - F0 
            plt.plot(time_data, f_data, label=f"Proef {proef} test {test}" )
    plt.xlabel('Time [s]')
    plt.ylabel('Force [N]')
    plt.title(title)
    plt.legend()
    plt.show()

df = pd.read_excel ("data.xlsx")
proef_numbers = [1,2]
title = "Tests 1 and 2"
plot_results(df, proef_numbers ,title)

I made some dummy data, and it seems to behave as expected. You can check with your real data to see if it works for it. Keep in mind for plots with more data, the legend might take up a large part of the plot. You will need to tweak it later.

You also mentioned that you wanted to rename your column names to itestjk where i is whether the x or y variable, j is the "proef" number, and k is the "test" number. If you make those changes, you will just need to change the time_data line and the f_data to the following:


time_data = df[f"xtest{proef}{test}"][1:DATA_END_VALUES[proef-1][test-1]]
f_data = df[f"ytest{proef}{test}"][1:DATA_END_VALUES[proef-1][test-1]] - F0

ZachS
  • 146
  • 5
  • Thank you very much for this answer. This is exactly what I've looking for. Learned a lot by it. Do you maybe know how I can store the arrays that I plot. Right now it just overwrites the data in time_data and f_data. I want to have an output of T{proef}{test}. I am looking for the command that does the same thing as you have done with df[f"F_{proef}_{test}"] but then in the name of the data so to say. – Lodewijk Pleij Sep 30 '21 at 09:52
  • So you want to store the arrays in another place besides the dataframe? If it was my data, I would just go back to the dataframe to get the data instead of creating 32 named variables. If I ran into performance issues, I would use a nested dictionary. Dynamically creating variables is not very common. There are usually better ways to solve the problem like dictionaries. [This stack overflow question](https://stackoverflow.com/questions/5036700/how-can-you-dynamically-create-variables) gives a way to dynamically create the variables though if there are no better ways of solving your problem. – ZachS Sep 30 '21 at 21:42
  • Thank you very much for your help. I understand what you mean! – Lodewijk Pleij Oct 01 '21 at 07:48
  • You're welcome! If this answer or any other one solved your issue, please mark it as accepted. – ZachS Oct 01 '21 at 19:40