0

Hello all out there at stackoverflow! My issue is: I want to read in in csv-Files from imdb, merge and add results and give them out. I can add new rows with calculations, e.g. divide the averageRating through 10 or something like this - works fine.

But the goal is to classify the data according to the number of votes. Code is like this:

import numpy
import pandas as pd
import time

df1 = pd.read_csv('imdb_title.csv', sep='\t')
df2 = pd.read_csv('imdb_ratings.csv', sep='\t')

output_csv = 'imdb_result.csv'
df = df1.merge(df2, how='outer')
df = df[df.titleType == 'movie']

for i in df.numVotes:
if i <= 5000:
j = 5.9
elif i <= 25000:
j = 6.6
...
elif i <= 1000000:
j = 8.2
else:
j = 8.4
df['estRate'] = j
print(i, j)
df.to_csv(output_csv, sep=';')

"print(i, j)" will give the correct answer, but output file won't.

Example wanted vs. result
|numVotes| result|numVotes| result|
| 30670.0| 7.2 | 30670.0 |6.6 |
| 04774.0| 5.9 | 04774.0| 6.6|
| 20876.0| 6.6 |20876.0| 6.6|

After searching and reading numerous articles, i tried to change the italic-written row:

df['estRate'] = j.copy() but i received the errormessage "AttributeError: 'float' object has no attribute 'copy'"

Then i tried using copy method
"df['estRate'] = copy.copy(j)" --> this is running but takes no effect. The last value of the result (6.6) is still the value written in any row in the csv-table.

I understand that the handling in dataframes is different and that's the reason i have to use the copy-method to ensure it is the at-time-value that is recognized.

Another try was to append date in an open file...

"df.to_csv(output_csv, sep=';', mode='a', header=False)"

but this will lead to n-times more rows (whilst part of the for... loop) or to just the last one, as earlier. What i need is to only write the 1st, second, third line of the df.

i tried for index, i enumerate... and than "index.to_csv" ... this leads to error "'int' object has no attribute 'to_csv'"

or df[index] or something like this, but this causes also hard errors.

May someone has a suggestion for my, i tried long and different suggestions, but nothing seems to work in my case.

M B S04
  • 1
  • 1
  • You lost the formatting so it's not possible to help you without guessing. I'm assuming your code is formatted that `j` has a fixed value and that's assigned to all `df` rows. You may want to write a function that accepts `i` as an input and returns the `j` value. Then do `df['estRate'] = df['numVotes'].apply(your_function_name)` - also what you do is called binning and pandas has a function for that, too – 576i Apr 22 '23 at 12:54
  • 1
    Your question needs a minimal reproducible example consisting of sample input, expected output, actual output, and only the relevant code necessary to reproduce the problem. See [How to make good reproducible pandas examples](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) for best practices related to Pandas questions. – itprorh66 Apr 22 '23 at 14:20
  • So easy ... you were right, the value of line i in a row named 'numVotes' and this should lead to a fix float-value. With the bin, it was just a few minutes. Thanks – M B S04 Apr 22 '23 at 19:57
  • can you please show us example DataFrames of what is imported, and of what you'd like exported? – hlin03 Apr 23 '23 at 03:51
  • @hlin03: the examples are from [link](https://www.imdb.com/interfaces/) - the title.basics and title.ratings db. Output is the table with all rows plus my new one. – M B S04 Apr 23 '23 at 05:39
  • thanks, please see above comment on https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples so we can best assist. Thanks – hlin03 Apr 23 '23 at 10:49

0 Answers0