1

I am trying to replace string values in a column without creating a copy. I have looked at the docs provided in the warning and also this question. I have also tried using .replace() with the same results. What am I not understanding? Code:

import pandas as pd
from datetime import timedelta

# set csv file as constant
TRADER_READER = pd.read_csv('TastyTrades.csv')
TRADER_READER['Strategy'] = ''

def iron_condor():
    TRADER_READER['Date'] = pd.to_datetime(TRADER_READER['Date'], format="%Y-%m-%d %H:%M:%S")
    a = 0
    b = 1
    c = 2
    d = 3
    for row in TRADER_READER.index:
        start_time = TRADER_READER['Date'][a]
        end_time = start_time + timedelta(seconds=5)
        e = TRADER_READER.iloc[a]
        f = TRADER_READER.iloc[b]
        g = TRADER_READER.iloc[c]
        h = TRADER_READER.iloc[d]
        if start_time <= f['Date'] <= end_time and f['Underlying Symbol'] == e['Underlying Symbol']:
            if start_time <= g['Date'] <= end_time and g['Underlying Symbol'] == e['Underlying Symbol']:
                if start_time <= h['Date'] <= end_time and h['Underlying Symbol'] == e['Underlying Symbol']:
                    e.loc[e['Strategy']] = 'Iron Condor'
                    f.loc[f['Strategy']] = 'Iron Condor'
                    g.loc[g['Strategy']] = 'Iron Condor'
                    h.loc[h['Strategy']] = 'Iron Condor'
                    print(e, f, g, h)
        if (d + 1) > int(TRADER_READER.index[-1]):
            break
        else:
            a += 1
            b += 1
            c += 1
            d += 1


iron_condor()

Warning:

SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)

Hopefully this satisfies the data needed to replicate:

,Date,Type,Action,Symbol,Instrument Type,Description,Value,Quantity,Average Price,Commissions,Fees,Multiplier,Underlying Symbol,Expiration Date,Strike Price,Call or Put
36,2019-12-31 16:01:44,Trade,BUY_TO_OPEN,QQQ   200103P00206500,Equity Option,Bought 1 QQQ 01/03/20 Put 206.50 @ 0.07,-7,1,-7,-1.0,-0.14,100.0,QQQ,1/3/2020,206.5,PUT
37,2019-12-31 16:01:44,Trade,BUY_TO_OPEN,QQQ   200103C00217500,Equity Option,Bought 1 QQQ 01/03/20 Call 217.50 @ 0.03,-3,1,-3,-1.0,-0.14,100.0,QQQ,1/3/2020,217.5,CALL
38,2019-12-31 16:01:44,Trade,SELL_TO_OPEN,QQQ   200103P00209000,Equity Option,Sold 1 QQQ 01/03/20 Put 209.00 @ 0.14,14,1,14,-1.0,-0.15,100.0,QQQ,1/3/2020,209.0,PUT
39,2019-12-31 16:01:44,Trade,SELL_TO_OPEN,QQQ   200103C00214500,Equity Option,Sold 1 QQQ 01/03/20 Call 214.50 @ 0.30,30,1,30,-1.0,-0.15,100.0,QQQ,1/3/2020,214.5,CALL
40,2020-01-03 16:08:13,Trade,BUY_TO_CLOSE,QQQ   200103C00214500,Equity Option,Bought 1 QQQ 01/03/20 Call 214.50 @ 0.07,-7,1,-7,0.0,-0.14,100.0,QQQ,1/3/2020,214.5,CALL

Expected result:

,Date,Type,Action,Symbol,Instrument Type,Description,Value,Quantity,Average Price,Commissions,Fees,Multiplier,Underlying Symbol,Expiration Date,Strike Price,Call or Put
36,2019-12-31 16:01:44,Trade,BUY_TO_OPEN,QQQ   200103P00206500,Equity Option,Bought 1 QQQ 01/03/20 Put 206.50 @ 0.07,-7,1,-7,-1.0,-0.14,100.0,QQQ,1/3/2020,206.5,PUT,Iron Condor
37,2019-12-31 16:01:44,Trade,BUY_TO_OPEN,QQQ   200103C00217500,Equity Option,Bought 1 QQQ 01/03/20 Call 217.50 @ 0.03,-3,1,-3,-1.0,-0.14,100.0,QQQ,1/3/2020,217.5,CALL,Iron Condor
38,2019-12-31 16:01:44,Trade,SELL_TO_OPEN,QQQ   200103P00209000,Equity Option,Sold 1 QQQ 01/03/20 Put 209.00 @ 0.14,14,1,14,-1.0,-0.15,100.0,QQQ,1/3/2020,209.0,PUT,Iron Condor
39,2019-12-31 16:01:44,Trade,SELL_TO_OPEN,QQQ   200103C00214500,Equity Option,Sold 1 QQQ 01/03/20 Call 214.50 @ 0.30,30,1,30,-1.0,-0.15,100.0,QQQ,1/3/2020,214.5,CALL,Iron Condor
40,2020-01-03 16:08:13,Trade,BUY_TO_CLOSE,QQQ   200103C00214500,Equity Option,Bought 1 QQQ 01/03/20 Call 214.50 @ 0.07,-7,1,-7,0.0,-0.14,100.0,QQQ,1/3/2020,214.5,CALL,
Luck Box
  • 90
  • 1
  • 13
  • Look up minimum example https://en.wikipedia.org/wiki/Minimal_working_example. Your code can not be replicated since people don't have the spreadsheet you are reading in. Also your code contains unnecessay lines to understand what could be wrong. – roadrunner66 May 04 '20 at 05:02
  • Please provide a piece of underlying data and the expected result. At the first look it is weird combination that you have a loop with *row* as the control variable but you never use it. Another important missing detail is: Which line of your code causes the warning you presented? – Valdi_Bo May 04 '20 at 05:04
  • I provided more data and the I have provided the entirety of the warning. If you want, I will take a screenshot to prove this. The reason I am using `row` is because it works. – Luck Box May 04 '20 at 05:10
  • I think problem is in `e = TRADER_READER.iloc[a]`, change it like mentioned in dupe by `e = TRADER_READER.iloc[a].copy()` and similar for `f,g,h` – jezrael May 04 '20 at 05:13
  • This is not giving an error but it is not replacing the values. – Luck Box May 04 '20 at 05:17

2 Answers2

2

Let's start from some improvements in the initial part of your code:

  1. The leftmost column of your input file is apparently the index column, so it should be read as the index. The consequence is some different approach to the way to access rows (details later).

  2. The Date column can be converted to datetime64 as early as at the reading time.

So the initial part of your code can be:

TRADER_READER = pd.read_csv('Input.csv', index_col=0, parse_dates=['Date'])
TRADER_READER['Strategy'] = ''

Then I decided to organize the loop other way:

  1. indStart is the integer index of the index column.

  2. As you process your file in "overlapping" couples of 4 consecutive rows, a more natural way to organize the loop is to stop on 4-th row from the end. So the loop is over the range(TRADER_READER.index.size - 3).

  3. Indices of 4 rows of interest can be read from the respective slice of the index, i.e. [indStart : indStart + 4]

  4. Check of particular row can be performed with a nested function.

  5. To avoid your warning, setting of values in Strategy column should be performed using loc on the original DataFrame, with row parameter for the respective row and column parameter for Strategy.

  6. The whole update (for the current couple of 4 rows) can be performed in a single instruction, specifying row parameter as a slice, from a thru d.

So the code can be something like below:

def iron_condor():
    def rowCheck(row):
        return start_time <= row.Date <= end_time and row['Underlying Symbol'] == undSymb

    for indStart in range(TRADER_READER.index.size - 3):
        a, b, c, d = TRADER_READER.index[indStart : indStart + 4]
        e = TRADER_READER.loc[a]
        undSymb = e['Underlying Symbol']
        start_time = e.Date
        end_time = start_time + pd.Timedelta('5S')
        if rowCheck(TRADER_READER.loc[b]) and rowCheck(TRADER_READER.loc[c]) and rowCheck(TRADER_READER.loc[d]):
            TRADER_READER.loc[a:d, 'Strategy'] = 'Iron Condor'
            print('New values:')
            print(TRADER_READER.loc[a:d])

No need to increment a, b, c and d. Neither break is needed.

Edit

If for some reason you have to do other updates on the rows in question, you can change my code accordingly.

But I don't understand "this csv file will make a new column" in your comment. For now anything you do is performed on the DataFrame in memory. Only after that you can save the DataFrame back to the original file. But note that even your code changes the type of Date column, so I assume you do it once and then the type of this column is just datetime64.

So you probably should change the type of Date column as a separate operation and then (possibly many times) update thie DataFrame and save the updated content back to the source file.

Edit following the comment as of 21:22:46Z

re.search('.*TO_OPEN$', row['Action']) returns a re.Match object if a match has been found, otherwise None.

So can not compare this result with the string searched. If you wanted to get the string matched, you should run e.g.:

mtch = re.search('.*TO_OPEN$', row['Action'])
textFound = None
if mtch:
    textFound = mtch.group(0)

But you actually don't need to do it. It is enough to check whether a match has been found, so the condition can be:

found = bool(re.search('.*TO_OPEN$', row['Action']))

(note that None cast to bool returns False and any non-Null object returns True).

Yet another (probably simpler and quicker) solution is that you run just:

row.Action.endswith('TO_OPEN')

without invoking any regex fuction.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
  • This works. The only thing is that I do have to specify the timestamp within the function as this csv file will make a new column any time anything is done to it. – Luck Box May 04 '20 at 10:15
  • If I simply read the csv file and print, it creates a new unnamed: 0 column which I have an open question about. I don't understand it myself. I am trying to also return a regex search in `rowCheck` with `row['Action'] == re.search('.*TO_OPEN$', row['Action']` but it's not returning anything. I'm assuming I'm grabbing the wrong string to check but what other string can I check? – Luck Box May 04 '20 at 21:22
1

Here is a quite elaborating post that can not only answer your question but also explain in details why things are the case.

Deal with SettingWithCopyWarning

In short if you want to set the value of the original df, either use .replace(inplace=True) or df.loc[condition, theColtoBeSet] = new_val

Boyu Zhang
  • 219
  • 2
  • 12
  • The link you provided is the link I provided, and I have done your example with `TRADER_READER.loc[a, 'Strategy'] = 'Iron Condor'`. Same result. – Luck Box May 04 '20 at 05:08
  • Also I am not quite clear why you need for row `e,f,g,h`, are them different from the remaining rows? I guess you need every four rows in the table to complete your condition determination? Can you provide the latest code that you have updated and ensure the output matches the code? – Boyu Zhang May 04 '20 at 05:47
  • I am doubting your `.loc` is not behaving as you expected, you are using `df.loc[a,'Strategy']`, where a is the index of a row, am I correct? Then it will certainly not updating the col `Strategy` since not col meets the condition `a`. – Boyu Zhang May 04 '20 at 06:19