-1

I have the following function below that:

  • Iterates over each row of my Dataframe
  • In each row I check if the condition is satisfied if gamestatistics['championname'].values[i] > gamestatistics['enemyname'].values[i]
  • Switch the positions of the elements for that row if True

I'm wondering if the below could be made faster somehow, as I expect to run this with very large DataFrames which I imagine would take a long time. This also seems very manual, I think I'm missing a function that could do this much faster.

    def main():
      gamestatistics = pd.read_csv('CvCaggregation.csv')
      for i in range(len(gamestatistics)):
        if gamestatistics['championname'].values[i] > gamestatistics['enemyname'].values[i]:
          print(f"Adjustment at {i}")
          currentChampName = gamestatistics['championname'].values[i]
          currentPosition = gamestatistics['position'].values[i]
          currentEnemy = gamestatistics['enemyname'].values[i]
          CurrentEnemyPosition = gamestatistics['enemyposition'].values[i]
          championWinRate = gamestatistics['championwinrate'].values[i]
          enemyWinRate = gamestatistics['enemywinrate'].values[i]
          gamestatistics.at[i, 'championname'] = currentEnemy
          gamestatistics.at[i, 'enemyname'] = currentChampName
          gamestatistics.at[i, 'position'] = CurrentEnemyPosition
          gamestatistics.at[i, 'enemyposition'] = currentPosition
          gamestatistics.at[i, 'championwinrate'] = enemyWinRate
          gamestatistics.at[i, 'enemywinrate'] = championWinRate
      return gamestatistics

Example Data:

Below evaluates true as Yasuo > Akali:

enter image description here

Therefore we switch the values of the columns to the below, that is, Columns 1&3, 2&4, 5&6 are swapped:

enter image description here

Jack
  • 105
  • 1
  • 10
  • This is an XY problem in that you should not be doing this because the decisions that brought you to this data structure in this way as a solution to what you want to do are not good decisions – CJR Sep 26 '22 at 17:58

1 Answers1

1

Does this answer you question ? Or give us an example of the csv values please.

import pandas as pd

gamestatistics = pd.read_csv('CvCaggregation.csv')
gamestatistics.update(gamestatistics.loc[
    gamestatistics['championname'] > gamestatistics['enemyname']].rename(
        {
            'championname': 'enemyname',
            'enemyname': 'championname',
            'position': 'enemyposition',
            'enemyposition': 'position',
            'championwinrate': 'enemywinrate',
            'enemywinrate': 'championwinrate',
        },
        axis=1))
ErnestBidouille
  • 1,071
  • 4
  • 16
  • Thanks, added an example to the bottom of the question – Jack Sep 26 '22 at 18:04
  • Is my answer working ? Because, with your example, for me, it is – ErnestBidouille Sep 26 '22 at 18:12
  • Will check and revert back – Jack Sep 26 '22 at 18:28
  • Works really well, could you explain the logic in a sentence or 2? Why my process was poor and yours is improved, etc.. – Jack Sep 26 '22 at 19:06
  • With dataframes, loops are avoided as much as possible to optimize performance. We use a technique called vectorization. I invite you to look at this post which will give you an example: https://stackoverflow.com/a/52674448/12158123 – ErnestBidouille Sep 26 '22 at 21:16
  • In this case, using loc, I select only the indexes that interest me: those that have `'championname' > 'enemyname'`. So I rename only the columns of these indexes. Then I update only these indexes in the initial dataframe – ErnestBidouille Sep 26 '22 at 21:22