2

I am trying to build the package, html5-parser for my Windows 10 python environment. I have read the instructions found here: https://html5-parser.readthedocs.io/en/latest/ but find them unclear.

The script that is used to run that package, while readily available from the Github repro doesn't work if run from the directory where it resides in that project. The reference above assumes the script will be run from a windows continuous integration server. Of course, my python development environment is not that.

I don't know how to proceed from here and I'm looking for some step by step instructions on how to build this package.

Elliott
  • 5,523
  • 10
  • 48
  • 87
  • Did not `python -m pip install html5-parser` work? – Prayson W. Daniel Apr 29 '21 at 05:17
  • If you look at the link I included in the post, pip install only works in Linux environments. I have a Windows 10 environment. The procedure for installing the html5-parser in that environment is the problem. It is not at all clear how to do this on a normal development machine, not a CI server – Elliott Apr 30 '21 at 21:36
  • Installing a package not designed for doing that is always a pain. Thought about develop "remote" in WSL2? For VS Code there are some quite convenient solutions. Also mabe `html5lib` is also supplying functions that might help you? – araisch Jan 24 '22 at 14:52
  • wsl doesn't work either. has same issue where you can't build the binaries. in fact might be harder due to how they're linked. – byteface Jan 24 '22 at 18:01
  • prebuilt binaries online for libxml2 etc dont have the lib files and says so in the readme. but there's a specific process here to build them then linking them for use right?... https://github.com/kovidgoyal/html5-parser/blob/master/.github/workflows/win-ci.py so can't that be modded to run in a window machine not a windows-ci server? – byteface Jan 24 '22 at 18:08

1 Answers1

1

This is not final solution / answer, but I want to post some steps which can help to achieve final solution.

First, You need to install MSCV build tools, follow this tutorial and make sure You have all the packages.

Then clone html5-parse library: https://github.com/kovidgoyal/html5-parser.git

Move win-ci.py from subfolder .github\workflows to the root folder of the repo (where setup.py is).

Edit win-ci.py and comment out lines:

#env = query_vcvarsall()
#os.environ.update(env)

From functions install_deps() and build().

Now open command line in the repo dictionary and run:
python win-ci.py install

This should install all dependencies.

After this, running python win-ci.py should execute build() function and finish installing of the library.

Unfortunately I had issues to make paths working properly.

After installing build tools I didn't have nmake.exe in path so I had to add it manually:
C:\Program Files (x86)\Microsoft Visual Studio\2019\BuildTools\VC\Tools\MSVC\14.29.30133\bin\Hostx64\x64

Then in build() step I've got stuck by missing sttdef.h.
I have library installed in C:\Program Files (x86)\Windows Kits\10\Include\10.0.19041.0\ucrt but still couldn't make it work.

E_net4
  • 27,810
  • 13
  • 101
  • 139
Domarm
  • 2,360
  • 1
  • 5
  • 17
  • 1
    this is a good idea. i will paste some of my notes also. which included using choclatey for pkgconfig etc – byteface Jan 24 '22 at 19:54