Questions tagged [data-harvest]

24 questions
2
votes
2 answers

Using PHP/JavaScript link to get information about site visitor

Somebody is trying to phish me, they are pretending to be one of my close friends to humiliate both of us. This person has created a fake email account, impersonating the person, and trying to get personal info out of me. I made sure with my friend…
XplozionMan
  • 21
  • 1
  • 5
2
votes
3 answers

Automatically pressing a "submit" button using python

The bus company I use runs an awful website (Hebrew,English) which making a simple "From A to B timetable today" query a nightmare. I suspect they are trying to encourage the usage of the costly SMS query system. I'm trying to harvest the entire…
Adam Matan
  • 128,757
  • 147
  • 397
  • 562
1
vote
4 answers

How to validate GoogleBot

I want to prevent data harvesting in my site (except googlebot of course). I am guessing relying on the UserAgent of GB is not strong enough (every bot can fake it) How can I still authenticate GoogleBot to avoid fakes.
Himberjack
  • 5,682
  • 18
  • 71
  • 115
1
vote
0 answers

DCAT RDF Harvesting errors

I tried DCAT RDF Harvesting at ckan. General ckan harvesting works well, but DCAT RDF Harvesting does not seem to add many configuratin feature. How can I resolve this error and get data from https://www.europeandataportal.eu ?? Received harvest…
solgit
  • 11
  • 3
1
vote
0 answers

Harvesting from THREDDS using GeoNetwork

I have a THREDDS instance: https://wci.earth2observe.eu/thredds/catalog-earth2observe.html and I am looking for a way to get the data in an ISO-19115 standard format. I have tried many solutions and am currently trying to get the information into a…
1
vote
1 answer

How can I display an XML page instead of JSON, for a dataset

I am using the pycsw extension to produce a CSW file. I have harvested data from one CKAN instance [1], into another [2], and am now looking to run the pycsw 'paster load' command: paster ckan-pycsw load -p /etc/ckan/default/pycsw.cfg -u [CKAN…
1
vote
1 answer

Harvest php API array to json

Using Harvest php API http://mdbitz.com/harvest-api/examples/ and my harvest php array prints following data: $myresult = $harvestAPI->getUser($client_id); $data = $myresult->get( "data" ); print_r($data); data: Harvest_User Object (…
user4081850
0
votes
2 answers

character (0) after scraping webpage in read_html

I'm trying to scrape "1,335,000" from the screenshot below (the number is at the bottom of the screenshot). I wrote the following code in R. t2<-read_html("https://fortune.com/company/amazon-com/fortune500/") employee_number <- t2 %>% …
Xian Zhao
  • 81
  • 1
  • 11
0
votes
2 answers

Web scraping with R: with multiple dropdown menu

I am trying to scrape data from the following websites with 4 dropdown menus - after clicking each dropdown menus they show a table from where I want to scrape data. I want to combine information from all tables from all dropdown menus. I am using…
Roy A
  • 1
0
votes
1 answer

Link redirection problem - Web Scraping in R using Rvest

While I was web scraping links from a news site using Rvest tools, I often stumbled upon links that redirects to another links. In those cases, I could only scrape the first link, while the second link was the one that actually contained data. For…
GBLucas
  • 35
  • 4
0
votes
1 answer

How to put a variable base on your build version in DefineConstants for a Harvest with HeatDirectory?

I need to harvest a directory on wix toolset, but this directory will have for name the build version number. I know how to define a static constant, but is it possible to make a variable one? I searched on forums, but never found a harvest based on…
0
votes
0 answers

At CKAN webbrowser I cannot see datasets in list but in activity stream, is there a extra module needed?

Using harvest oai-pmh to send metadata from dspace to ckan. I can see the files (packages) in activity stream but not in dataset list. The link in activity stream leads to data in dspace surface. also in db i see the entries for the packages.…
Juri
  • 1
0
votes
1 answer

Data Harvesting in R: Get nested lists, unlist, make edits, re-nest them back

the following code harvests data from a website. I retrieve a list of lists, I want to unlist one of the lists, edit it, then re-nest it back into the data into the form the data was received. Here is my code below, it fails one the…
Susie
  • 9
  • 1
  • 5
0
votes
1 answer

Nested function to retrieve data from multiple URLs (with authentication) in R

My code below is designed to retrieve data (and its metadata) with authentication through an API endpoint, and return all metadata into a dataframe. I want to create a nested function to repeat this same process for another API endpoint with the…
Susie
  • 9
  • 1
  • 5
0
votes
0 answers

Get Total size of data harvested by heat tool

I have used Heat.exe provided by WiX. It takes a copy of the directory structure which is called harvesting a directory. What i want is to Get Total size of the data which it harvests. Is there a solution to it, please help. Thanks in advance.
Ashish Rana
  • 135
  • 1
  • 11
1
2