2

tl;dr although a tldr wont explain everything fully, i have an external program ( lets say pid 1234 ) is trying to read from another external process ( lets say pid 1111 ), 1111 always reads from its own stdin, but 1234 wants to handle the stdin instead of the program, but 1111 sometimes blocks 1234 from reading a byte from /proc/1111/fd/0, which is not desired, i want to know how to make 1234 always read a byte from it and 1111 never or rarely be able to read from it

i have been trying to develop a concept for GNU BASH syntax highlighting as the only other syntax highlighter for bash i found is VERY slow because it implements the whole readline lib in BASH

i ran into some issues but came up with something working, somewhat -- https://gist.github.com/TruncatedDinosour/e2034cf470f268596235a5c88ffcd048

you can find a more in-depth explenation on it at https://blog.ari-web.xyz/b/bash-syntax-highlighting-part-one-concept/ currently i asked for general developer public for the answer, 'maybe someone knows' i thought to myself, but i think i might get an answer faster here

basically, this concept has a problem, sometimes it misses a byte because bash steals the read() i think, so far i have tried

  • using LD_PRELOAD to overwrite the read() function ( 'oh it surely should work' )
    • making always return 0 ( 'okay if it falsely reads nothing surely itll work' )
    • making always return 1 ( 'uh, maybe lets make it think we read a single byte even though its empty ?' )
    • redirecting it to a FIFO ( 'maybe just redirecting it to a different type of file would work' )
    • closing it ( not sure what i was thinking here )
  • using os.write / read ( 'maybe directly using unbuffered syscalls would be faster, maybe its just python being slow' )
  • using C++ ( 'okay surely if its a python being slow problem c++ can fix it' )
  • using C ( 'okay, c++ didnt work, maybe c being a simpler language could give me more performance ?' )

currently, i never was able to fully get rid of the issue, so i thought to ask here and the general public, maybe you, fellow people, know how to achieve what im trying to achieve

thanks for any help, ideas or clues in advance :)

Ari157
  • 95
  • 4
  • 16
  • 1
    At a first glance, it seems like this would be a huge security hole if you could hijack an existing process's file handles. – chepner Mar 05 '23 at 14:27
  • @chepner well i mean, i kinda did it already, i just want to know if theres a way to hijack it better xD but, i dont think so, i think its worse when you can hijack privileged process fds, userland processes are fine imo – Ari157 Mar 05 '23 at 14:28
  • 1
    As far as I understand, you aren't so much hijacking its input, but writing to it because stderr is the same terminal; manipulating it using ansi escape code backspaces, and then triggering bash into actually reading it by sending bash an interrupt. It may be worth figuring out exactly what is going on here for educational purposes, but not as a practical concept for a syntax highlighter. – Emanuel P Mar 05 '23 at 15:17
  • @EmanuelP thanks for the response, but itd still be helpful to know how to do what im asking, ive been hacking around it and wasnt able to figure it out, thats why i asked here, once again, thanks – Ari157 Mar 05 '23 at 15:20
  • speaking of syntax highlighting, its pretty much impossible to do it with bash from what i researched without doing a lot of work, like reiomplementing readline lib, thats why im making more of a hacky solution which could potentially work – Ari157 Mar 05 '23 at 15:35
  • An interesting problem, but not an ideal presentation (from StackOverrflow point of view). Can you include <40 lines of code that demonstrate one (orr central) aspect of your project. Something people can copy/paste into their own environment and compile? (I will look at your blog latter, but don't expect readers here to goto exterrnal sites to understand your problem). All that said, good luck! – shellter Mar 05 '23 at 16:48
  • @shellter thanks for the response, heres the script https://gist.github.com/TruncatedDinosour/e2034cf470f268596235a5c88ffcd048 and thanks for the good luck :) sorry for not responding quickly, i wasnt on my computer – Ari157 Mar 05 '23 at 19:33
  • 1
    Will look at it later. You are in the top 1% of response times! (I see it all the time, people post a Q, readers ask questions in comments, no reply for 24+ hrs!) . Good luck. – shellter Mar 05 '23 at 19:38
  • ive been hacking around and not able to find a proper solution, best i came up with was trying to overcome bash, basically i noticed that the reading allows one reader then the other read bytes, but if you have like 200 readers, its going to be only a 1/200 chance bash will be able to steal the byte from me, so ... i guess thats a discovery, you can see yourself by `for _ in $(seq 200); do /bin/cat /proc/SOME_PID/fd/* & done` – Ari157 Mar 06 '23 at 20:02
  • The only way to get priority over a file descriptor of another process is for that other process not to have that file descriptor. I would go with `wrapper bash` and handle fd to/from bash. `its going to be only a 1/200` You can use a realtime operating system and make your process scheduled always before other process. Otherwise, any number is just luck. – KamilCuk Mar 07 '23 at 20:24
  • @KamilCuk could you explain what you mean ?, i mean, i tried `close(STDIN_FILENO)` with the LD_PRELOAD thing but it didnt go as planned, well idk, ill see what i can do later, as of now im quite tired, so i need time off, ill try to hack around tmrw, thanks <3 – Ari157 Mar 07 '23 at 20:58
  • 1
    If I understand the whole thing correctly, the easy way is to do what `script`, `screen` and similar tools do. Create a `pty`, Start your victim process with stdin/stdout/stderr connected to your `pty`, and read/process/pass-through data through that `pty` and your original stdin/out/err. – Hasturkun Mar 08 '23 at 15:59
  • The people I've seen address this _comprehensively_ founded a startup around it, with tooling that provides its own terminal (so like `screen`/`tmux`/etc, as others have suggested). (No, I don't remember the company name; talked with them briefly last time I was in the middle of a job search, but it wasn't actually a problem space I find interesting -- I like my terminals right out of the 1980s, eff colors, dynamic prompts, &c). – Charles Duffy Mar 08 '23 at 15:59
  • thank you kun and charles for the answers :), i am aware that you can do screen stuff, @Hasturkun 's amswer is also very helpful, thank you a lot – Ari157 Mar 08 '23 at 18:46
  • 1
    I think you have to ***parent*** both process in order to prepare each input/output. Using a shell (as a shell;). Maybe the way I address input/output of *script replay* (without syntax coloration) could match some of your goals. In [How to manipulate timing and typescript files created by "script" command?](https://stackoverflow.com/a/71009297/1765658) – F. Hauri - Give Up GitHub Mar 09 '23 at 07:42
  • Not sure to understand well: Do you want to dynamically syntax highlighting for interactive shell session? – F. Hauri - Give Up GitHub Mar 10 '23 at 08:23
  • 1
    Have a look at [this comment about *race condition* when merging many output in same file descriptor](https://stackoverflow.com/questions/75546641/bash-awk-parallel-select-process-for-each-line-of-a-huge-file/75548027#comment133485356_75548027), and maybe this [bash IPC Demo](https://stackoverflow.com/a/55761489/1765658) or [bash background process modify global variable](https://stackoverflow.com/a/13209479/1765658) – F. Hauri - Give Up GitHub Mar 10 '23 at 08:29

1 Answers1

3

To better understand your options we need to take a step back and consider how a process gets its I/O.

  • At the lowest level there is the system call sequence: open(2) to obtain the FD, read(2)/write(2) (and a few others) to actually do the I/O, and close(2). These are provided by the kernel directly. To issue them, you need a low level assembly command (SYSCALL/SYSENTER or ARM's SVC).

  • One level above that are the system call wrappers. the above calls (open(2), etc) are often called from libc wrappers, which are exported functions which "hide" the underlying SVC call.

  • one level above that are any wrappers by Python/Java (e.g. InputStreams, etc), or higher level languages.

if you want to intercept, you thus have several options:

  • LD_PRELOAD, which you've already tried - this only works on two conditions:

    A) You do it BEFORE the process starts, since LD_* variables are parsed by the linker B) what you're hijacking is the system call wrapper (or other function). In other words, if the process is doing the low level assembly instruction inline, that won't work.

  • Hook the actual low level assembly call/SVC - works every time, but cumbersome (requires effectively debugging the target process and waiting it to issue a sys call, then trapping all sys calls, and filtering out read(2).

  • Hook at kernel level by redirecting the FD - will always work, but probably an overkill for your requirements, and will also require kernel code execution (commonly, via a kernel module). Mentioned here only for completeness - again, probably N/A and overkill.

  • Dynamic hooking of system call: using ptrace(2) API, so you remain in user mode and don't go into kernel mode. the well known strace(1) tool records sys calls, but cannot intercept them. jtrace - http://NewAndroidBook.com/tools/jtrace.html - can do so using a simple plugin API. This is probably your path of least resistance.

  • redirect the FD from target process to injecting process: Either via LD_PRELOAD or some other dynamic injection of code you can open a pipe, socket, or some other IPC primitive in place of the original FD. dup2(2) is highly useful for that.

Per your specific issue - when the stdin is a terminal - there are other considerations. Namely, that bash or whichever shells also manipulates the terminal directly - using ioctl(2) codes (look at stty(1) for examples, too). One other consideration is that the syntax highlighting is done via ANSI escape sequences, which involves writing to the terminal - (\e + other curses). This might potentially account for the "stolen byte" you mention you're encountering.

TL;DR, like you say - that answers the injection question in the most comprehensive way we can try, but your particulars of syntax highlighting might require to be addressed differently. Be more specific, and we can try to be, as well :-)

Technologeeks
  • 7,674
  • 25
  • 36
  • currently the best answer i have, thank you, ill gie you the +400 in a 24 hours if nobody answers :) – Ari157 Mar 08 '23 at 18:42
  • We'd be happy to elaborate further - we're not doing it (just) for the +400 :-) If you need any add'l clarifications or detail, shoot. – Technologeeks Mar 09 '23 at 01:03
  • @Ari157 This is a complex question! I have to leave now, won't be able to take time about this before this week-end! I think 6 days is a good delay, you will get more answers/ideas around your concept if you stay patient. – F. Hauri - Give Up GitHub Mar 09 '23 at 07:49
  • @F.Hauri-GiveUpGitHub thanks for the response, but idk if ill get a better answer, but ill take your word for it, ill wait until next monday – Ari157 Mar 09 '23 at 17:32
  • Again, what other detail do you want? We think this pretty much covers all bases. If you want more details or different aspects, tell us. We'll gladly address them – Technologeeks Mar 10 '23 at 17:32
  • ill give you the +400 – Ari157 Mar 11 '23 at 11:18
  • @Technologeeks idk, im asking for anything – Ari157 Mar 11 '23 at 11:18