1

I have a csv list of 335 gene access numbers, and I want to put all of them into a certain URL:

https://www.ncbi.nlm.nih.gov/nuccore/DQ147858.1?report=fasta

Where the 8-letter gene access numbers (DQ147858 above) is different in each URL and from the corresponding csv list.

And then I need to also know how to access all the generated URLs with Requests.

Any help is very much appreciated.

Nihilismaa
  • 41
  • 5
  • 3
    This website is not a coding service, please try on your own, and when your are stuck on a specific problem do not hesitate to ask here. – Gabriel Devillers Jul 27 '18 at 19:53
  • If you need to web-suck just use an multi cursor editor to modify the CVS file to the URL needed and then use wget or curl to fetch them all. Or use a regex in the editor to transform each line in the CVS to the required URL – rioV8 Jul 27 '18 at 20:49

3 Answers3

1

You can generalize the url creation with a method:

def build_url(gene):
    return 'https://www.ncbi.nlm.nih.gov/nuccore/' + gene + '.1?report=fasta'

Then, to build for every gene you can iterate over the initial list and apply the function build_url for every gene.

# Generic extraction of list genes from csv
genes = extract_list(csv)

# Using list comprehension
genes_urls = [build_url(gene) for gene in genes]

# Using regular for
genes_urls = []
for gene in genes:
    genes_urls.append(build_url(gene))

Following this answer, to make a request, you would simply do:

import requests

# Using list comprehension
res = [requests.get(url) for url in genes_urls]

# Using regular for
res = []
for url in genes_urls:
    res.append(requests.get(url))

Additionally, you can use multithreading to speed up the requests.

leoschet
  • 1,697
  • 17
  • 33
1

To read a .csv, I use this:

result = []
for line in open("file.csv"):
    result.append(line.split(','))

This will give you a list of each element between the commas. I don't know which of the se elements you need, but take a look at result[0] to see which index you need.

With the index you need,

fmtstr  = "https://www.ncbi.nlm.nih.gov/nuccore/{}?report=fasta"
urls = []
for lst in result:
    urls.append(fmtstr.format( lst[desired_index] ))

Then, you can iterate through the list of urls and use the requests library as you desire.

This isn't the most compact way of doing things, but it's functional and separates steps for simpler viewing.

Gigaflop
  • 390
  • 1
  • 13
0

csv = open('PATH_TO_CSV', 'r') for gene_number in csv.readlines().split(','): URL = 'https://www.ncbi.nlm.nih.gov/nuccore/' + gene_number + '.1?report=fasta' // request parsing here

Zach
  • 156
  • 8