0

So i was given two excel files which is like this example

movieId   title                 genres
1       Toy Story (1995)        Adventure|Animation|Children|Comedy|Fantasy
2       Jumanji   (1995)        Adventure|Children|Fantasy
3       Grumpier Old Men (1995) Comedy|Romance
4       Waiting to Exhale (1995)    Comedy|Drama|Romance
5       Father of the Bride Part II (1995)  Comedy

the thing I'm trying to make is when someone types in a title the code would find the movieID and the movie name. The only problem is I have no idea where to start I'm a noob coder and I've been trying my best to learn but i have no idea, if you guys can help me and point me in the right direction that would be amazing.

Thank you

2 Answers2

3

Here's how you'd do it in openpyxl, since you included the openpyxl tag in your question:

import openpyxl as xl

workbook = xl.load_workbook(filename="test.xlsx")

title_column_name = "title"

# Get the active worksheet
ws = workbook.active

# The String we'll search for. You could prompt the user to provide
# this using python2's raw_input, oder python3's input function.
searchstring = "Grumpier"

# ws.rows[1:] means we'll skip the first row (the header row).
for row in ws.rows[1:]:
    # row[1] is the title column. string.find(str) returns -1
    # if the value was not found, or the index in the string if
    # the value was found.
    if row[1].value.find(searchstring) != -1:
        print("Found a matching row! MovieId={0}, Title={1}".format(row[0].value, row[1].value))

Output:

Found a matching row! MovieId=3, Title=Grumpier Old Men (1995)
Georg Grab
  • 2,271
  • 1
  • 18
  • 28
  • This is a good start but it is probably best to always create an index of the contents to search from. – Charlie Clark Mar 15 '17 at 14:45
  • What does the != 1 and title_column_name = "title" do? – Shay Merlin Cormac Mar 16 '17 at 23:11
  • Also if i were to input more than one movie how would i change the code to print all the movies – Shay Merlin Cormac Mar 17 '17 at 04:14
  • @ShayMerlinCormac oops, `title_column_name` does nothing at all in this code. You could use that variable if the location of your title column were unknown (do a horizontal search on the header row first until the row matches `title_column_name`) – Georg Grab Mar 17 '17 at 09:13
  • for the `!= -1` please consult the python docs: https://docs.python.org/2/library/string.html#string.find – Georg Grab Mar 17 '17 at 09:15
  • for multiple searchstrings you could change `searchstring` into a list with multiple entries and add a second `for` loop inside the `for row in ws.rows[1:]` loop where you iterate through that searchstring list – Georg Grab Mar 17 '17 at 09:16
  • If i were only to grab the Movie ID and use the same code to look for the id in another excel file. How would I do that? – Shay Merlin Cormac Mar 28 '17 at 11:56
  • also can you explain to me what != -1 mean cause i cannot find it in the docs – Shay Merlin Cormac Mar 28 '17 at 12:17
1

Okay, since you're a noob coder, I'll explain it to you in a simple way that doesn't actually require any libraries. Also I'm going to assume you are using movie title and move name interchangeably.

First, you can transform an excel file into a .csv, which stands for comma separated file (via excel, just save as, select csv. You can do it via google sheets too). What is a csv file? It's like the excel file except every row is on a line by itself and different columns are separated by commas. So the first three lines in your csv would be:

movieId,title,genres
1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
2,Jumanji   (1995),Adventure|Children|Fantasy

Now, the .csv can be read as a regular file. You should read them in line by line. Here is the python doc for that. It's pretty straight forward.

Now that you have every line as a string, we can split them via the string.split() command. We need to split using the comma as a delimiter since it's a comma separated file. So far our code is something like this (I assume you read the different lines of the csv into the lines arrays):

lines = [...] # a list of strings which are the different lines of the csv
name_im_looking_for = "move you like" # the movie you're looking for
for(l in lines):
    columns = l.split(',')
    id = columns[0]
    name = columns[1]
    if(name.find(name_im_looking_for) != -1): 
        # this means the name you're looking for is within the 'name' col
        print "id is", id, "and full name is", name

This is just a crude way to do it, but if you're really new to programming should help you get on your way! If you have any questions, feel free to ask (and if you're actually good and you just want to know how to use openpyxl, please specify so in your question).

Emad Y
  • 415
  • 4
  • 14
  • 1
    Why convert to CSV? It's possible to do this in openpyxl. – Charlie Clark Mar 15 '17 at 14:41
  • because he mentioned he's a noob coder, and he probably just added the tag because a quick google told him to. If he's looking for an openpxl solution, he should probably mention that. If not, I believe this is more informational, especially for a **noob** coder. I might be wrong, but that's my interpretation. – Emad Y Mar 15 '17 at 16:07
  • He specifically mentions Excel files so you add a layer of complexity with a the format change. More importantly, however, the question is a duplicate. – Charlie Clark Mar 15 '17 at 17:51