CS50 Week 6 | DNA | Issues

Question

I have been working on this DNA problem for a while now, and I do not understand what causes the many problems. Right now there are just some technical problems, for example saying "No match" instead of "Lavender", but I believe there is no problem when reading the files in the folder "sequences". In check50, there is a traceback error:

 Traceback (most recent call last):
  File "/tmp/tmp7j4mzmff/test2/dna.py", line 83, in <module>
    main()
  File "/tmp/tmp7j4mzmff/test2/dna.py", line 18, in main
    os.chdir(path)
FileNotFoundError: [Errno 2] No such file or directory: '/workspaces/9...

Can you help?

import csv
import sys
import os

def main():
    if len(sys.argv) != 3:
        print("Usage: python dna.py data.csv sequence.txt")
        sys.exit(1)

    database = []
    with open(sys.argv[1], "r") as file:
        reader = csv.DictReader(file)
        for row in reader:
            database.append(row)

    path = "/workspaces/90389241/dna/sequences"
    os.chdir(path)
    for file in os.listdir():
        with open(file, 'r') as file:
            sequence = file.read()


# TODO: Find longest match of each STR in DNA sequence
    subsequences = list(database[0].keys())[1:]

    finals = {}
    for subsequence in subsequences:
        finals[subsequence] = longest_match(sequence, subsequence)

# TODO: Check database for matching profiles
    for someone in database:
        count2 = 0
        for subsequence in subsequences:
            if (int(someone[subsequence]) == finals[subsequence]):
                count2 += 1

        if count2 == len(subsequences):
            print(someone["name"])
            return

        print("No match")
        return


def longest_match(sequence, subsequence):
    """Returns length of longest run of subsequence in sequence."""

    # Initialize variables
    longest_run = 0
    subsequence_length = len(subsequence)
    sequence_length = len(sequence)

    # Check each character in sequence for most consecutive runs of subsequence
    for i in range(sequence_length):

        # Initialize count of consecutive runs
        count = 0

        # Check for a subsequence match in a "substring" (a subset of characters) within sequence
        # If a match, move substring to next potential match in sequence
        # Continue moving substring and checking for matches until out of consecutive matches
        while True:

            # Adjust substring start and end
            start = i + count * subsequence_length
            end = start + subsequence_length

            # If there is a match in the substring
            if sequence[start:end] == subsequence:
                count += 1

            # If there is no match in the substring
            else:
                break

        # Update most consecutive matches found
        longest_run = max(longest_run, count)

    # After checking for runs at each character in seqeuence, return longest run found
    return longest_run


main()

My first given error was, and I quote, "KeyError: 0" for "subsequences = list(database[0].keys())[1:]", but I don't think there is a problem with accessing the data inside of "database". I had used

if len(sys.argv[2]) == "databases/small.csv":
    with open("small.csv", "r") as file:
        reader = csv.DictReader(file)
        for row in reader:
            database.append(row)

elif len(sys.argv[2]) == "databases/large.csv":
    with open("large.csv", "r") as file:
        reader = csv.DictReader(file)
        for row in reader:
            database.append(row)

instead of what I used above.

Please post code, data, and **results** as text, not screenshots ([how to format code in posts](https://stackoverflow.com/help/formatting)). [Why should I not upload images of code/data/errors?](https://meta.stackoverflow.com/questions/285551/why-should-i-not-upload-images-of-code-data-errors) http://idownvotedbecau.se/imageofcode — Barmar, Sep 01 '23 at 20:07
`dir_name` is probably wrong. You've written it as an absolute pathname, I suspect it's supposed to be relative. — Barmar, Sep 01 '23 at 20:10
I don't understand why dir_name is wrong though, how can I fix it? — Berra Eylül Toprak, Sep 01 '23 at 20:12
It would help if the error message weren't cut off in the middle. I suspect it says something like "File or directory not found". — Barmar, Sep 01 '23 at 20:12
Do you understand the difference between absolute and relative paths? Try removing the first `/`. — Barmar, Sep 01 '23 at 20:13
When I remove the first /, it says no such file or directory in the terminal. When I put the first /, then it says "No match" in every situation. After "File...", there is no more error message, it was cut off by the system itself, unfortunately. — Berra Eylül Toprak, Sep 01 '23 at 20:19
I wasn't blaming you for the cutoff, it's a problem with check50. — Barmar, Sep 01 '23 at 20:20
But without the detailed error message, it's hard to know what needs to be fixed. — Barmar, Sep 01 '23 at 20:24
I have added a more clear error message after changing up the code a little. — Berra Eylül Toprak, Sep 01 '23 at 20:35
Where did you get `/workspaces/90389241/dna/sequences` from? I don't see that anywhere on https://cs50.harvard.edu/x/2023/psets/6/dna/ — Barmar, Sep 01 '23 at 20:39
You must have copied something wrong, because it's not finding the directory. If you have shell access you can use `ls` to figure out which part is wrong. — Barmar, Sep 01 '23 at 20:43
`ls -d /workspaces /workspaces/90389241 /workspaces/90389241/dna /workspaces/90389241/dna/sequences` — Barmar, Sep 01 '23 at 20:44
Welcome to Stack Overflow. Please read [ask] and [How do I ask and answer homework questions?](https://meta.stackoverflow.com/questions/334822), and keep in mind that this is **not a discussion forum**; we expect **one** question per post, and it should be a question that you have **already tried to look up** from existing Stack Overflow questions. — Karl Knechtel, Sep 01 '23 at 20:46
"but I don't think there is a problem with accessing the data inside of "database"" - in your own words, what do you think the `database[0]` part of the code means? What do you think is the **type** of `database` at this point in the code? If you don't already know, did you try to find out what `KeyError` means? What happened when you tried? — Karl Knechtel, Sep 01 '23 at 20:47
Anyway, I gave you a duplicate link for the path issue, because it's the only question here that can actually be made sense of and answered with this description. — Karl Knechtel, Sep 01 '23 at 20:48
The database[0] part of the code accesses the first element in the list "database". This list comes from the csv file. — Berra Eylül Toprak, Sep 01 '23 at 20:55

CS50 Week 6 | DNA | Issues

0 Answers0