Questions tagged [data-harvest]
24 questions
2
votes
2 answers
Using PHP/JavaScript link to get information about site visitor
Somebody is trying to phish me, they are pretending to be one of my close friends to humiliate both of us. This person has created a fake email account, impersonating the person, and trying to get personal info out of me. I made sure with my friend…

XplozionMan
- 21
- 1
- 5
2
votes
3 answers
Automatically pressing a "submit" button using python
The bus company I use runs an awful website (Hebrew,English) which making a simple "From A to B timetable today" query a nightmare. I suspect they are trying to encourage the usage of the costly SMS query system.
I'm trying to harvest the entire…

Adam Matan
- 128,757
- 147
- 397
- 562
1
vote
4 answers
How to validate GoogleBot
I want to prevent data harvesting in my site (except googlebot of course).
I am guessing relying on the UserAgent of GB is not strong enough (every bot can fake it)
How can I still authenticate GoogleBot to avoid fakes.

Himberjack
- 5,682
- 18
- 71
- 115
1
vote
0 answers
DCAT RDF Harvesting errors
I tried DCAT RDF Harvesting at ckan. General ckan harvesting works well, but DCAT RDF Harvesting does not seem to add many configuratin feature. How can I resolve this error and get data from https://www.europeandataportal.eu ??
Received harvest…

solgit
- 11
- 3
1
vote
0 answers
Harvesting from THREDDS using GeoNetwork
I have a THREDDS instance: https://wci.earth2observe.eu/thredds/catalog-earth2observe.html and I am looking for a way to get the data in an ISO-19115 standard format. I have tried many solutions and am currently trying to get the information into a…

ojstandeven
- 31
- 6
1
vote
1 answer
How can I display an XML page instead of JSON, for a dataset
I am using the pycsw extension to produce a CSW file. I have harvested data from one CKAN instance [1], into another [2], and am now looking to run the pycsw 'paster load' command:
paster ckan-pycsw load -p /etc/ckan/default/pycsw.cfg -u [CKAN…

ojstandeven
- 31
- 6
1
vote
1 answer
Harvest php API array to json
Using Harvest php API http://mdbitz.com/harvest-api/examples/ and my harvest php array prints following data:
$myresult = $harvestAPI->getUser($client_id);
$data = $myresult->get( "data" );
print_r($data);
data:
Harvest_User Object (…
user4081850
0
votes
2 answers
character (0) after scraping webpage in read_html
I'm trying to scrape "1,335,000" from the screenshot below (the number is at the bottom of the screenshot). I wrote the following code in R.
t2<-read_html("https://fortune.com/company/amazon-com/fortune500/")
employee_number <- t2 %>%
…

Xian Zhao
- 81
- 1
- 11
0
votes
2 answers
Web scraping with R: with multiple dropdown menu
I am trying to scrape data from the following websites with 4 dropdown menus - after clicking each dropdown menus they show a table from where I want to scrape data. I want to combine information from all tables from all dropdown menus.
I am using…

Roy A
- 1
0
votes
1 answer
Link redirection problem - Web Scraping in R using Rvest
While I was web scraping links from a news site using Rvest tools, I often stumbled upon links that redirects to another links. In those cases, I could only scrape the first link, while the second link was the one that actually contained data. For…

GBLucas
- 35
- 4
0
votes
1 answer
How to put a variable base on your build version in DefineConstants for a Harvest with HeatDirectory?
I need to harvest a directory on wix toolset, but this directory will have for name the build version number.
I know how to define a static constant, but is it possible to make a variable one?
I searched on forums, but never found a harvest based on…

Bruno Dubout
- 5
- 1
0
votes
0 answers
At CKAN webbrowser I cannot see datasets in list but in activity stream, is there a extra module needed?
Using harvest oai-pmh to send metadata from dspace to ckan. I can see the files (packages) in activity stream but not in dataset list. The link in activity stream leads to data in dspace surface. also in db i see the entries for the packages.…

Juri
- 1
0
votes
1 answer
Data Harvesting in R: Get nested lists, unlist, make edits, re-nest them back
the following code harvests data from a website. I retrieve a list of lists, I want to unlist one of the lists, edit it, then re-nest it back into the data into the form the data was received. Here is my code below, it fails one the…

Susie
- 9
- 1
- 5
0
votes
1 answer
Nested function to retrieve data from multiple URLs (with authentication) in R
My code below is designed to retrieve data (and its metadata) with authentication through an API endpoint, and return all metadata into a dataframe. I want to create a nested function to repeat this same process for another API endpoint with the…

Susie
- 9
- 1
- 5
0
votes
0 answers
Get Total size of data harvested by heat tool
I have used Heat.exe provided by WiX.
It takes a copy of the directory structure which is called harvesting a directory.
What i want is to Get Total size of the data which it harvests.
Is there a solution to it, please help.
Thanks in advance.

Ashish Rana
- 135
- 1
- 11