0

My problem is that I have a dataframe, work on another dataframe and the first edits too. Why could this be?

>untokenized_tweet_tp

                                                     text  ...       screenName
0       [month, open, #postdoc, position, chemical, ch...  ...        VRiffault
1       [hardworking, biofuel, producers, iowa, state,...  ...  LindaWa53201017
3       [today, time, imperative, resort, alternate, s...  ...        ROBRAIPUR
4       [special, gaetanos, beach, club, bell, choosin...  ...    buffbiodiesel
7       [stena, bulk, introduce, low, carbon, shipping...  ...      NPortuarias
                                                   ...  ...              ...
130060  [reseter, elite, vegan, make, unacceptable, ea...  ...      Randy_Anglo
130171  [solar, wind, destroy, supply, limited, output...  ...  RealRichardBail
130331  [renewable, energy, defined, wood, wood, waste...  ...      PaulSchmehl
130375                     [guess, aiding, wood, passion]  ...     GraceIrene21
130384  [homogenous, white, state, diversity, propagan...  ...      Randy_Anglo
[52411 rows x 3 columns]
for i in tweet_tp.index.values:
...     tweet_tp.text[i] = TreebankWordDetokenizer().detokenize(tweet_tp.text[i])
... 
>untokenized_tweet_tp
... 
                                                     text  ...       screenName
0       month open #postdoc position chemical characte...  ...        VRiffault
1       hardworking biofuel producers iowa state worki...  ...  LindaWa53201017
3       today time imperative resort alternate sources...  ...        ROBRAIPUR
4       special gaetanos beach club bell choosing #rec...  ...    buffbiodiesel
7        stena bulk introduce low carbon shipping options  ...      NPortuarias
                                                   ...  ...              ...
130060  reseter elite vegan make unacceptable eat meat...  ...      Randy_Anglo
130171  solar wind destroy supply limited output backe...  ...  RealRichardBail
130331  renewable energy defined wood wood waste munic...  ...      PaulSchmehl
130375                          guess aiding wood passion  ...     GraceIrene21
130384  homogenous white state diversity propaganda wi...  ...      Randy_Anglo
[52411 rows x 3 columns]

Notice I never mentioned untokenized_tweet_tp inside the for loop.

>type(tweet_tp)
<class 'pandas.core.frame.DataFrame'>
>type(untokenized_tweet_tp)
<class 'pandas.core.frame.DataFrame'>

untokenized_tweet_tp first gets declared like this untokenizd_tweet_tp=tweet_tp

Diggy.
  • 6,744
  • 3
  • 19
  • 38

1 Answers1

0
untokenizd_tweet_tp=tweet_tp 

This is the key.

If you do not want changes to tweet_tp to affect untokenizd_tweet_tp then do

untokenizd_tweet_tp=tweet_tp.copy()

Otherwise any changes you make to one will affect the other

why should I make a copy of a data frame in pandas

This should be a good reference conceptually

adir abargil
  • 5,495
  • 3
  • 19
  • 29