8

Say I have a dataframe that looks like:

d = {'option1': ['1', '0', '1', '1'], 'option2': ['0', '0', '1', '0'], 'option3': ['1', '1', '0', '0'], 'views': ['6', '10', '5', '2']
df = pd.DataFrame(data=d)

print(df)

  option1 option2 option3 views
0       1       0       1     6
1       0       0       1    10
2       1       1       0     5
3       1       0       0     2

I'm trying to build a for loop that iterates over each column (except the column "views") and each row. If the value of a cell is not 0, I want to replace it with the corresponding value of the column "views" from the same row.

The following output is required (should be easier to understand):

  option1 option2 option3 views
0       6       0       6     6
1       0       0      10    10
2       5       5       0     5
3       2       0       0     2

I tried something like:

df_range = len(df)

for column in df:
    for i in range(df_range):
        if column != 0:
            column = df.views[i]

But I know I'm missing something, it does not work.

Also please note that in my real dataframe, I have dozens of columns, so I need something that iterates over each column automatically. Thanks!!

I saw this thread Update a dataframe in pandas while iterating row by row but it doesn't exactly apply to my problem, because I'm not only going row by row, I also need to go column by column.

Notna
  • 491
  • 2
  • 8
  • 19
  • Try using iterrows – Joe Feb 23 '18 at 15:30
  • Using iterrows would work for the rows, but I'm also trying to iterate over the columns. That's why my question is not exaclty the same as the other thread pointed as the answer... – Notna Feb 23 '18 at 15:42
  • I'll reopen it and take a harder look at the dupiness. – piRSquared Feb 23 '18 at 15:46
  • When you are iterating over a DataFrame with `for column in df`, your column variable will be the column name. `column != 0:` won't work because of that. If you are trying to access that specific cell, you need to check `df[column].iloc[i] !=0`. Know that this is horribly inefficient though. – ayhan Feb 23 '18 at 15:50
  • Is there any efficient alternative you could think of to tackle the problem? Maybe rearrange somehow the dataframe first? – Notna Feb 23 '18 at 15:54
  • 1
    Is this a representative example? If so, I would just multiply option columns with the view column: `df.loc[:, 'option1':'option3'].mul(df['views'], axis=0)` You can assign this back in the same way `df.loc[:, 'option1':'option3']. = df.loc[:, 'option1':'option3'].mul(df['views'], axis=0)` – ayhan Feb 23 '18 at 15:59
  • It was a representative example, so yes in that case your solution is actually perfect, thanks. – Notna Feb 23 '18 at 17:00

4 Answers4

7

You can also achieve the result you want this way:

for col in df:
    if col == 'views':
        continue
    for i, row_value in df[col].iteritems():
        df[col][i] = row_value * df['views'][i]

Notice the following about this solution:

1) This solution operates on each value in the dataframe individually and so is less efficient than broadcasting, because it's performing two loops (one outer, one inner).

2) This solution assumes that option1...option N are binary because essentially this solution is multiplying each binary value in option1...option N with the values in views.

3) This solution will work for any number of option columns. The option columns may have any labels you desire.

4) This solution assumes there is a column labeled views.

Keith Dowd
  • 631
  • 4
  • 10
3

You don't need to iterate through rows. This one should be faster: Ensure that the columns values are integers.

## convert column type to integer
for i in df:
    df[i] = df[i].astype(int)

## update columns
for col in df:
    if col != 'views':
        df[col] = df[col] * df['views']

df

    option1     option2     option3     views
0      6          0            6          6
1      0          0           10         10  
2      5          5            0          5
3      2          0            0          2
YOLO
  • 20,181
  • 5
  • 20
  • 40
2
dataSet = pd.read_excel("dataset.xlsx")
i = 0 ;
for column in dataSet:
    for i in dataSet[column].iteritems():
        if (column == 'views'):
            print (i)
luqman ahmad
  • 171
  • 1
  • 7
  • 1
    Welcome to Stack Overflow! Thank you for the code snippet, which might provide some limited, immediate help. A proper explanation would greatly improve its [long-term value](https://meta.stackexchange.com/q/114762/206345) by describing why this is a good solution to the problem, and would make it more useful to future readers with other similar questions. Please edit your answer to add some explanation, including the assumptions you've made. – sepehr Dec 25 '18 at 10:19
0

I think this would work:

df=df.astype(int)
df[df.columns[:-1]]= np.where(df[df.columns[:-1]]>0, 1, 0)
df[df.columns[:-1]]= df[df.columns[:-1]].mul(df['views'].as_matrix(), axis=0)
  • if your dataframe is binary you don't have to use : df[df.columns[:-1]]= np.where(df[df.columns[:-1]]>0, 1, 0) –  Feb 23 '18 at 16:38