How can we find the last row, in a specific column, in a data frame?

Question

I have a data frame named 'data["df"]'. The data looks like this.

How can I get the last row in the column named 'adjclose'? The number that I want is '28.430000'.

I am trying to append this to a list. I tested this piece of code:

print(str(data.iloc[-1, data.columns.get_loc("adjclose")]) = 0))

I get this: AttributeError: 'dict' object has no attribute 'iloc'

What am I doing wrong here?

Here is my full code.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout, Bidirectional
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from yahoo_fin import stock_info as si
from collections import deque

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import time
import os
import random


# set seed, so we can get the same results after rerunning several times
np.random.seed(314)
tf.random.set_seed(314)
random.seed(314)



def load_data(ticker, n_steps=50, scale=True, shuffle=True, lookup_step=1, 
                test_size=0.2, feature_columns=['adjclose', 'volume', 'open', 'high', 'low']):
    # see if ticker is already a loaded stock from yahoo finance
    if isinstance(ticker, str):
        # load it from yahoo_fin library
        df = si.get_data(ticker)
    elif isinstance(ticker, pd.DataFrame):
        # already loaded, use it directly
        df = ticker
    # this will contain all the elements we want to return from this function
    result = {}
    # we will also return the original dataframe itself
    result['df'] = df.copy()
    # make sure that the passed feature_columns exist in the dataframe
    for col in feature_columns:
        assert col in df.columns, f"'{col}' does not exist in the dataframe."
    if scale:
        column_scaler = {}
        # scale the data (prices) from 0 to 1
        for column in feature_columns:
            scaler = preprocessing.MinMaxScaler()
            df[column] = scaler.fit_transform(np.expand_dims(df[column].values, axis=1))
            column_scaler[column] = scaler

        # add the MinMaxScaler instances to the result returned
        result["column_scaler"] = column_scaler
    # add the target column (label) by shifting by `lookup_step`
    df['future'] = df['adjclose'].shift(-lookup_step)
    # last `lookup_step` columns contains NaN in future column
    # get them before droping NaNs
    last_sequence = np.array(df[feature_columns].tail(lookup_step))
    # drop NaNs
    df.dropna(inplace=True)
    sequence_data = []
    sequences = deque(maxlen=n_steps)
    for entry, target in zip(df[feature_columns].values, df['future'].values):
        sequences.append(entry)
        if len(sequences) == n_steps:
            sequence_data.append([np.array(sequences), target])
    # get the last sequence by appending the last `n_step` sequence with `lookup_step` sequence
    # for instance, if n_steps=50 and lookup_step=10, last_sequence should be of 59 (that is 50+10-1) length
    # this last_sequence will be used to predict in future dates that are not available in the dataset
    last_sequence = list(sequences) + list(last_sequence)
    # shift the last sequence by -1
    last_sequence = np.array(pd.DataFrame(last_sequence).shift(-1).dropna())
    # add to result
    result['last_sequence'] = last_sequence
    # construct the X's and y's
    X, y = [], []
    for seq, target in sequence_data:
        X.append(seq)
        y.append(target)
    # convert to numpy arrays
    X = np.array(X)
    y = np.array(y)
    # reshape X to fit the neural network
    X = X.reshape((X.shape[0], X.shape[2], X.shape[1]))
    # split the dataset
    result["X_train"], result["X_test"], result["y_train"], result["y_test"] = train_test_split(X, y, test_size=test_size, shuffle=shuffle)
    # return the result
    return result

def create_model(sequence_length, units=256, cell=LSTM, n_layers=2, dropout=0.3,
                loss="mean_absolute_error", optimizer="rmsprop", bidirectional=False):
    model = Sequential()
    for i in range(n_layers):
        if i == 0:
            # first layer
            if bidirectional:
                model.add(Bidirectional(cell(units, return_sequences=True), input_shape=(None, sequence_length)))
            else:
                model.add(cell(units, return_sequences=True, input_shape=(None, sequence_length)))
        elif i == n_layers - 1:
            # last layer
            if bidirectional:
                model.add(Bidirectional(cell(units, return_sequences=False)))
            else:
                model.add(cell(units, return_sequences=False))
        else:
            # hidden layers
            if bidirectional:
                model.add(Bidirectional(cell(units, return_sequences=True)))
            else:
                model.add(cell(units, return_sequences=True))
        # add dropout after each layer
        model.add(Dropout(dropout))
    model.add(Dense(1, activation="linear"))
    model.compile(loss=loss, metrics=["mean_absolute_error"], optimizer=optimizer)
    return model


# Window size or the sequence length
N_STEPS = 100
# Lookup step, 1 is the next day
LOOKUP_STEP = 1
# test ratio size, 0.2 is 20%
TEST_SIZE = 0.2
# features to use
FEATURE_COLUMNS = ["adjclose", "volume", "open", "high", "low"]
# date now
date_now = time.strftime("%Y-%m-%d")
### model parameters
N_LAYERS = 3
# LSTM cell
CELL = LSTM
# 256 LSTM neurons
UNITS = 256
# 40% dropout
DROPOUT = 0.4
# whether to use bidirectional RNNs
BIDIRECTIONAL = False
### training parameters
# mean absolute error loss
# LOSS = "mae"
# huber loss
LOSS = "huber_loss"
OPTIMIZER = "adam"
BATCH_SIZE = 64
EPOCHS = 2


# save the dataframe
from datetime import date
from datetime import datetime

start_date = date(2020, 3, 1)
end_date = date(2020, 9, 18)
days = np.busday_count(start_date, end_date)



# Apple stock market

tickers = ['CHIS']

thelen = len(tickers)

z=0
all_stocks=[]

for ticker in tickers:
    ticker_data_filename = os.path.join("data", f"{ticker}_{date_now}.csv")
    # model name to save, making it as unique as possible based on parameters
    model_name = f"{date_now}_{ticker}-{LOSS}-{OPTIMIZER}-{CELL.__name__}-seq-{N_STEPS}-step-{LOOKUP_STEP}-layers-{N_LAYERS}-units-{UNITS}"
    if BIDIRECTIONAL:
        model_name += "-b"
    
    
    # create these folders if they does not exist
    if not os.path.isdir("results"):
        os.mkdir("results")
    if not os.path.isdir("logs"):
        os.mkdir("logs")
    if not os.path.isdir("data"):
        os.mkdir("data")
        
    
    # load the data
    data = load_data(ticker, N_STEPS, lookup_step=LOOKUP_STEP, test_size=TEST_SIZE, feature_columns=FEATURE_COLUMNS)
    
    # save the dataframe
    data["df"].to_csv(ticker_data_filename)
    
    # construct the model
    model = create_model(N_STEPS, loss=LOSS, units=UNITS, cell=CELL, n_layers=N_LAYERS,
                        dropout=DROPOUT, optimizer=OPTIMIZER, bidirectional=BIDIRECTIONAL)
    
    # some tensorflow callbacks
    checkpointer = ModelCheckpoint(os.path.join("results", model_name + ".h5"), save_weights_only=True, save_best_only=True, verbose=1)
    tensorboard = TensorBoard(log_dir=os.path.join("logs", model_name))
    
    history = model.fit(data["X_train"], data["y_train"],
                        batch_size=BATCH_SIZE,
                        epochs=EPOCHS,
                        validation_data=(data["X_test"], data["y_test"]),
                        callbacks=[checkpointer, tensorboard],
                        verbose=1)
    
    model.save(os.path.join("results", model_name) + ".h5")
    
    
    # after the model ends running...or during training, run this
    # tensorboard --logdir="logs"
    # http://localhost:6006/
    
    
    data = load_data(ticker, N_STEPS, lookup_step=LOOKUP_STEP, test_size=TEST_SIZE,
                    feature_columns=FEATURE_COLUMNS, shuffle=False)
    
    # construct the model
    model = create_model(N_STEPS, loss=LOSS, units=UNITS, cell=CELL, n_layers=N_LAYERS,
                        dropout=DROPOUT, optimizer=OPTIMIZER, bidirectional=BIDIRECTIONAL)
    
    model_path = os.path.join("results", model_name) + ".h5"
    model.load_weights(model_path)
    
    
    # evaluate the model
    mse, mae = model.evaluate(data["X_test"], data["y_test"], verbose=0)
    # calculate the mean absolute error (inverse scaling)
    mean_absolute_error = data["column_scaler"]["adjclose"].inverse_transform([[mae]])[0][0]
    #print("Mean Absolute Error:", mean_absolute_error)
    
    
    def predict(model, data, classification=False):
        # retrieve the last sequence from data
        last_sequence = data["last_sequence"][:N_STEPS]
        # retrieve the column scalers
        column_scaler = data["column_scaler"]
        # reshape the last sequence
        last_sequence = last_sequence.reshape((last_sequence.shape[1], last_sequence.shape[0]))
        # expand dimension
        last_sequence = np.expand_dims(last_sequence, axis=0)
        # get the prediction (scaled from 0 to 1)
        prediction = model.predict(last_sequence)
        # get the price (by inverting the scaling)
        predicted_price = column_scaler["adjclose"].inverse_transform(prediction)[0][0]
        return predicted_price
    
    
    # predict the future price
    future_price = predict(model, data)
    print(f"Future price after {LOOKUP_STEP} days is ${future_price:.2f}")

    z=z+1
    print(str(z) + ' of ' + str(thelen))

    
    # print(new_seriesdata['Adj Close'].iloc[-1])
    all_stocks.append(ticker + ' act: ' + str(future_price) + ' prd: ' + str(future_price))

It appears that your `data` is not a dataframe, but a dictionary. Please provide a complete example. However, from your remark "I have a data frame named 'data["df"]'", perhaps `data` isn't your dataframe but `data['df']` is? — Grismar, Sep 24 '20 at 00:49
Once you do have your dataframe, note that there's a well documented function to get the last records, called `tail()`. Something as simple as `your_df.tail(1)['adjclose']` is probably all you need. — Grismar, Sep 24 '20 at 00:56
Does this answer your question? [How to get the last N rows of a pandas DataFrame?](https://stackoverflow.com/questions/14663004/how-to-get-the-last-n-rows-of-a-pandas-dataframe) — ndclt, Sep 24 '20 at 00:58
If it's a dictionary, you can get the item you need simply using `data['adjclose'][-1]` — NotAName, Sep 24 '20 at 01:02

score 2 · Accepted Answer · answered Sep 24 '20 at 01:01

2

As Grismar answered, yours is currently defined as a dictionary. If you get it in a DataFrame format, you could also represent it as a list of values from a column:

a = your_df['adjclose']

And then retrieve the last value of the list (last row accordingly).

print(a[len(a)-1])

answered Sep 24 '20 at 01:01

Nikolas

49
3

Thanks everyone. I'm kind of lost here. I tried these options: df['adjclose'] Yields: KeyError: 'adjclose' df.tail(1)['adjclose'] Yields: KeyError: 'adjclose' df['adjclose'][-1] Yields: KeyError: 'adjclose' – ASH Sep 24 '20 at 01:31
Can you please post how you are structuring your df? Seems like adjclose is not present in your df. – Nikolas Sep 24 '20 at 02:57
I just posted my full code. It may be a bit confusing, but hopeful it helps more than ti hurts!! – ASH Sep 24 '20 at 03:22
@ Grismar & pavel, you are both right. THese are two different ways to do the same thing. Thanks a lot!! One more question. Why do I get 450 records in 'data["df"]' but I only have 147 days in 'days'. I would expect 147 records in 'data["df"]'. Why is there a discrepancy? – ASH Sep 24 '20 at 12:31
@ Grismar & pavel, I figured it out. I just sliced the data frame and I'm getting what I want now. Thanks everyone!! – ASH Sep 24 '20 at 13:37

How can we find the last row, in a specific column, in a data frame?

1 Answers1