Pandas dataframe only writes last value in a loop

Question

Hello all out there at stackoverflow! My issue is: I want to read in in csv-Files from imdb, merge and add results and give them out. I can add new rows with calculations, e.g. divide the averageRating through 10 or something like this - works fine.

But the goal is to classify the data according to the number of votes. Code is like this:

import numpy
import pandas as pd
import time

df1 = pd.read_csv('imdb_title.csv', sep='\t')
df2 = pd.read_csv('imdb_ratings.csv', sep='\t')

output_csv = 'imdb_result.csv'
df = df1.merge(df2, how='outer')
df = df[df.titleType == 'movie']

for i in df.numVotes:
if i <= 5000:
j = 5.9
elif i <= 25000:
j = 6.6
...
elif i <= 1000000:
j = 8.2
else:
j = 8.4
df['estRate'] = j
print(i, j)
df.to_csv(output_csv, sep=';')

"print(i, j)" will give the correct answer, but output file won't.

Example wanted vs. result
|numVotes| result|numVotes| result|
| 30670.0| 7.2 | 30670.0 |6.6 |
| 04774.0| 5.9 | 04774.0| 6.6|
| 20876.0| 6.6 |20876.0| 6.6|

After searching and reading numerous articles, i tried to change the italic-written row:

df['estRate'] = j.copy() but i received the errormessage "AttributeError: 'float' object has no attribute 'copy'"

Then i tried using copy method
"df['estRate'] = copy.copy(j)" --> this is running but takes no effect. The last value of the result (6.6) is still the value written in any row in the csv-table.

I understand that the handling in dataframes is different and that's the reason i have to use the copy-method to ensure it is the at-time-value that is recognized.

Another try was to append date in an open file...

"df.to_csv(output_csv, sep=';', mode='a', header=False)"

but this will lead to n-times more rows (whilst part of the for... loop) or to just the last one, as earlier. What i need is to only write the 1st, second, third line of the df.

i tried for index, i enumerate... and than "index.to_csv" ... this leads to error "'int' object has no attribute 'to_csv'"

or df[index] or something like this, but this causes also hard errors.

May someone has a suggestion for my, i tried long and different suggestions, but nothing seems to work in my case.

You lost the formatting so it's not possible to help you without guessing. I'm assuming your code is formatted that `j` has a fixed value and that's assigned to all `df` rows. You may want to write a function that accepts `i` as an input and returns the `j` value. Then do `df['estRate'] = df['numVotes'].apply(your_function_name)` - also what you do is called binning and pandas has a function for that, too — 576i, Apr 22 '23 at 12:54
Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. — itprorh66, Apr 22 '23 at 14:20
So easy ... you were right, the value of line i in a row named 'numVotes' and this should lead to a fix float-value. With the bin, it was just a few minutes. Thanks — M B S04, Apr 22 '23 at 19:57
can you please show us example DataFrames of what is imported, and of what you'd like exported? — hlin03, Apr 23 '23 at 03:51
@hlin03: the examples are from [link](https://www.imdb.com/interfaces/) - the title.basics and title.ratings db. Output is the table with all rows plus my new one. — M B S04, Apr 23 '23 at 05:39
thanks, please see above comment on https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples so we can best assist. Thanks — hlin03, Apr 23 '23 at 10:49

Pandas dataframe only writes last value in a loop

0 Answers0