0

Say I want to run any piece of code I have written (a simple example below):

df_VisitorType_no = pd.DataFrame(columns=['VisitorType_no'])
for i in range(df.shape[0]):
    if df.loc[i,'VisitorType'] == 'Returning_Visitor':
        df_VisitorType_no.loc[i,'VisitorType_no'] = 1
    elif df.loc[i,'VisitorType'] == 'New_Visitor':
        df_VisitorType_no.loc[i,'VisitorType_no'] = 2
    else:
        df_VisitorType_no.loc[i,'VisitorType_no'] = 3

My dataframe df has a huge number of rows and I want to first test out the code I have written to see if I had written it correctly by not running on all the rows in df but just a select few (say first 100 rows) so that I could quickly check the code I have written works correctly.

Instead of doing this:

df = df[0:100,:]

df_VisitorType_no = pd.DataFrame(columns=['VisitorType_no'])
for i in range(df.shape[0]):
    if df.loc[i,'VisitorType'] == 'Returning_Visitor':
        df_VisitorType_no.loc[i,'VisitorType_no'] = 1
    elif df.loc[i,'VisitorType'] == 'New_Visitor':
        df_VisitorType_no.loc[i,'VisitorType_no'] = 2
    else:
        df_VisitorType_no.loc[i,'VisitorType_no'] = 3

Is there a way in Python where I could just specify something like n_rows = 100 at the top of the code, ie. something like this?:

n_rows = 100

df_VisitorType_no = pd.DataFrame(columns=['VisitorType_no'])
for i in range(df.shape[0]):
    if df.loc[i,'VisitorType'] == 'Returning_Visitor':
        df_VisitorType_no.loc[i,'VisitorType_no'] = 1
    elif df.loc[i,'VisitorType'] == 'New_Visitor':
        df_VisitorType_no.loc[i,'VisitorType_no'] = 2
    else:
        df_VisitorType_no.loc[i,'VisitorType_no'] = 3

My question would also apply to arrays if there is a way to do this in both dataframes and arrays. Many thanks.

Leockl
  • 1,906
  • 5
  • 18
  • 51
  • What is wrong with `df1 = df[0:100,:]`? Also, FYR, you do not need that loop at all. – DYZ Feb 09 '20 at 04:26
  • 1
    `df['VisitorType'] = df['VisitorType'].map({'Returning_Visitor': 1, 'New_Visitor: 2, ...})` – cs95 Feb 09 '20 at 04:35
  • Hi @DYZ & @cs95, the code snippet I've above is a simple example. If I've a block of code which is long & complex, it would be beneficial to test it on just the 1st few rows rather than the whole dataset (which say contain 12mil rows) as it'll take forever. So testing it on 1st few rows 1st will save time in checking my code that it works fine. If I've `df1 = df[0:100,:]` at the top, then I would need to change all the `df` in my code block to `df1`. If I've `df = df[0:100,:]` then after testing with the 100 rows & my code works, I will need to rerun code to generate the original `df`. – Leockl Feb 09 '20 at 05:54
  • 1
    Make a `.copy()` of `df` before sclicing it. – DYZ Feb 09 '20 at 06:19

0 Answers0