4

I'm trying to replicate this Stata loop in Pandas:

forvalues i = 1/6 {    
    gen int codeL`i' = L`i'.location_level_2
    gen int codeF`i' = F`i'.location_level_2    
}

As you can see, I want to create these new columns: codeL1 code L2...and so on, until I get codeL6, based on the lags and leads of the variable location_level_2

It is kind of easy in Stata, but as I'm just starting in Pandas, I have no clue.

This would be my attempt:

for i in range(1,7):   
      df[codeLi] = df[location_level_2].shift(i)

for i in range(-1,-7):   
      df[codeLi] = df[location_level_2].shift(i)
Nick Cox
  • 35,529
  • 6
  • 31
  • 47
  • Welcome to stack overflow! Please take a moment to look through [How to create good pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) and provide a [mcve] including a sample of your input, and what you would like your output to look like, so that we can better help you. What was the result of the attempt you included above? – G. Anderson Jul 22 '19 at 19:57
  • As another Stata user who has converted to (or is trying to convert to python/pandas), this was one of the most frustrating aspects. Stata's "macro" functionality makes it easy to iteratively *create* variables with flexible naming as in your example. python/pandas doesn't work this way. See [this page](http://www.data-analysis-in-python.org/python_for_stata.html) for a bit more detail. **Edit:** Leaving for context, but this is not correct, at least in referring to column names. See ALollz's comment below. – Brendan Jul 22 '19 at 20:03
  • 1
    @Brendan, python can do exactly the same. `df[f'codeL{i}'] = df['location_level_2'].shift(i)`, and IMO the python syntax is far superior because you need to be explicit with what is a string, and what is a variable. – ALollz Jul 22 '19 at 20:04
  • @ALollz Ah - I clearly didn't know that. Thanks! That could come in quite useful. So that works with pandas column names. I take it python still can't refer to 'python variables' (e.g. data frame names, list names, etc.) with f-string notation, though? – Brendan Jul 22 '19 at 20:06
  • 1
    @Brendan It can, but you typically don't want to pollute the global namespace since unlike `locals` in Stata they stick around. In those cases, you'd typically store everything in some container like a dict (See https://stackoverflow.com/questions/1373164/how-do-i-create-a-variable-number-of-variables) which you can then easily select by label. – ALollz Jul 22 '19 at 20:08
  • That's a good clarification. I was sort of equating "can be done but is very strongly advised against and/or not part of the intended functionality" with "can't be done" but of course those aren't strictly the same. – Brendan Jul 22 '19 at 20:13
  • Thanks to everyone who helped. I think I will stick with this solution for now. I'm trying to translate an entire script to Pandas and for many reasons we are working on a database that we still don't have. I will come back in the future for some help, for sure. Thank you all for your patience given my low English level! – – Felipe Coy Combita Jul 22 '19 at 20:28

2 Answers2

2

You are very close to a working solution! Here's one that follows the suggestion in the comment by @ALollz:

for i in range(1, 7):
    df[f'codeL{i}'] = df['location_level_2'].shift(i)
    df[f'codeF{i}'] = df['location_level_2'].shift(-i)

Note that the structure f'codeL{i} formats the integer from the range automatically. If you wanted to create the variable separately for some reason you might do new_var = 'codeL' + str(i).

Also note that there is no reason to do a second loop with a range of negative numbers, just pass the negative integer to pd.shift.

Brendan A.
  • 1,268
  • 11
  • 16
  • 1
    Thanks to you and everyone who helped. I think I will stick with this solution for now. I'm trying to translate an entire script to Pandas and for many reasons we are working on a database that we still don't have. I will come back in the future for some help, for sure. Thank you all for your patience given my low English level! – Felipe Coy Combita Jul 22 '19 at 20:25
1

This will probably help you:

df=pd.DataFrame([[2,2,2],[1,2,3]])
for i in range(1,6):   
    df['codeL'+str(i)]=df.iloc[:,i]
razimbres
  • 4,715
  • 5
  • 23
  • 50