0

In this case I want to read integers from standard input such that they are separated by spaces and newline. My first attempt was similar to the following code:

fn splitter(x: String) -> impl Iterator<Item=&'static str> {
    x.as_str().split_whitespace()
}

fn valuereader<A: std::str::FromStr>() -> impl Iterator<Item=A> 
where <A as std::str::FromStr>::Err: std::fmt::Debug
{
    let a = std::io::stdin().lines();
    let b = a.map(Result::unwrap);
    let c = b.flat_map(splitter);
    c.map(|x|x.parse().expect("Not an integer!"))
}

fn main() {
    let temp: Vec<usize> = valuereader().collect();
    println!("{:?}", temp);
}

The problem is that split_whitespace wants a &str, but std::io::stdin().lines() returns an owned String. I don't want to use x.as_str().split_whitespace().collect(), because I don't want to allocate a temporary vector.

The best solution I could come up with was to use a wrapper that contains the owned String and the iterator that depends on the String, using unsafe code. The wrapper's implementation of Iterator is simply a wrapper for the iterator that depends on the String. This was the result:

mod move_wrapper {
    use std::pin::Pin;
    pub fn to_wrapper<'b, A: 'b, F, B: 'b> (a: A, f: F) -> Wrapper<A,B>
    where
        F: FnOnce (&'b A) -> B
    {
        let contained_a = Box::pin(a);
        // Here is the use of unsafe. It is necessary to create a reference to a that can live as long as long as needed.
        // This should not be dangerous as no-one outside this module will be able to copy this reference, and a will live exactly as long as b inside Wrapper.
        let b = f(unsafe{&*core::ptr::addr_of!(*contained_a)});
        Wrapper::<A,B> {_do_not_use:contained_a, dependent:b}
    }

    pub struct Wrapper<A,B> {
        _do_not_use: Pin<Box<A>>,
        dependent: B
    }

    impl<A,B: Iterator> Iterator for Wrapper<A,B>
    {
        type Item = B::Item;
        fn next(&mut self) -> Option<Self::Item> {
            self.dependent.next()
        }
    }
}

fn splitter(x: String) -> impl Iterator<Item=&'static str> {
    move_wrapper::to_wrapper(x, |a|a.as_str().split_whitespace())
}

fn valuereader<A: std::str::FromStr>() -> impl Iterator<Item=A> 
where <A as std::str::FromStr>::Err: std::fmt::Debug
{
    let a = std::io::stdin().lines();
    let b = a.map(Result::unwrap);
    let c = b.flat_map(splitter);
    c.map(|x|x.parse().expect("Not an integer!"))
}

fn main() {
    let temp: Vec<usize> = valuereader().collect();
    println!("{:?}", temp);
}

Now to the actual question. How would you do this as idiomatic as possible, if possible without using any unsafe code (does the function here called to_wrapper exist)? Have I written safe unsafe code? Is there any way to make my Wrapper work for all traits, not just Iterator?

EDIT

To be clearer, this question is about creating a method you can apply anytime you have to give ownership to something that wants a reference, not about how to read from standard input and parse to integers.

  • Unsafe code refers to code within an `unsafe` context, that is, code that has been explicitly marked unsafe. All this to say that there is no "risk" of accidentally writing unsafe code. – jthulhu Apr 11 '23 at 19:24
  • @jthulhu Did I say it is any risk I accidentally write unsafe code? In this case I have done it deliberately. The question is if I have managed to do it in a safe way. – true equals false Apr 11 '23 at 19:34
  • The `unsafe` in your code is hard to spot. You should at the least have `// SAFETY` comment explaining why you think it's sound. – PitaJ Apr 11 '23 at 19:36
  • BTW: `lines()` is slow because it has to allocate a new `String` for each line, it's better to use `read_line` directly. But of course, then you can't use `Iterator`. – PitaJ Apr 11 '23 at 19:40
  • @PitaJ: Though the OP may still want to do that; if they're so concerned with performance that `x.as_str().split_whitespace().collect()` is objectionable due to allocating a temporary `Vec`, presumably the allocation activity for a `String` per line is also an issue. Mind you, this is almost certainly premature optimization; even on the fastest modern hard drives, I/O would almost certainly swamp any allocator overhead, so I'd just do the convenient thing until I had evidence it was causing problems. – ShadowRanger Apr 11 '23 at 19:56
  • `x.as_str().split_whitespace().collect()` wouldn't work either, because the slices would be referencing the `String` that's dropped at the end of `splitter`. – PitaJ Apr 11 '23 at 19:58
  • @ShadowRanger The reason I am doing it the more complicated way is because I am learning rust and want to get some experience of the most idiomatic methods. In this case it may not be much more efficient, but in other cases the difference may be more prominent. Then the experience is really useful. – true equals false Apr 11 '23 at 20:03
  • @PitaJ `x.as_str().split_whitespace().map(|a|a.to_owned()).collect()` would work. – true equals false Apr 11 '23 at 20:10
  • Sure but that's just adding allocations, a `String` for each number. It would be better to just do the parsing there instead. – PitaJ Apr 11 '23 at 20:15
  • Messing around with lazy iterators like this is fun, but if you're just going to `collect` in the end, just do everything eagerly: [playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=c9c89d84cc6d81ce1225621fb1173c07) – PitaJ Apr 11 '23 at 20:18
  • @PitaJ One reason not to do it that way is if you are going to read multiple lines of input several times. I am not sure if this will work if you change how the buffering of the terminal works, but there may be some use-cases: `let temp: Vec = valuereader().take(10).collect();` and then `let temp2: Vec = valuereader().collect();`. But you would probably do it the way you did it if you actually had to read several lines of integers. – true equals false Apr 11 '23 at 20:27

1 Answers1

0

You can't create self-referential types in safe rust, as explained here. Unfortunately, fixing this issue still leaves the next one.

Iterators that return items that can't outlive the iterator are impossible, explained here. Yours is even more restrictive: it's trying to create items that can't exist by the time the next item is fetched, which means you need a lending iterator to make this in its current state.

It would be quite easy to create your Vec with some nested for loops:

fn valuereader<A: FromStr>() -> Vec<A>
where
    A::Err: Debug,
{
    let mut v = Vec::new();
    for line in std::io::stdin().lines() {
        for word in line.unwrap().split_whitespace() {
            let a = word.parse().unwrap();
            v.push(a);
        }
    }
    v
}

However, this is not very instructive, and creates many temporary allocations, especially if you only need an iterator and not a Vec.

In order to make your original idea work idiomatically, you need an iterator that produces owned items. Fortunately, your final item type is usize (or anything that comes out of parse, which has to be owned), so you can create an iterator that creates those. This only allocates one String, which will grow to the length of the longest line. (playground)

use std::fmt::Debug;
use std::io::BufRead;
use std::marker::PhantomData;
use std::str::FromStr;

#[derive(Debug, Clone)]
struct ValueReader<B, V> {
    // The underlying BufRead
    buffer: B,
    // The current line being read
    line: String,
    // The current offset into the current line
    index: usize,
    // The type being parsed
    _value_type: PhantomData<V>,
}

impl<B, V> ValueReader<B, V> {
    fn new(b: B) -> Self {
        Self {
            buffer: b,
            line: String::new(),
            index: 0,
            _value_type: PhantomData,
        }
    }

    fn value(&mut self) -> Option<V>
    where
        V: FromStr,
        V::Err: Debug,
        B: BufRead,
    {
        loop {
            // Check if line is consumed, or the iterator just started
            if self.line.is_empty() {
                let bytes = self.buffer.read_line(&mut self.line).unwrap();
                // Buffer is completely consumed
                if bytes == 0 {
                    return None;
                }
            }

            let unconsumed = self.line[self.index..].trim_start();
            self.index = self.line.len() - unconsumed.len();

            let Some(word) = unconsumed.split_whitespace().next() else {
                // Line is consumed, reset to original state
                self.index = 0;
                self.line.clear();
                continue;
            };
            self.index += word.len();

            return Some(word.parse().unwrap());
        }
    }
}

impl<B, V> Iterator for ValueReader<B, V>
where
    V: FromStr,
    V::Err: Debug,
    B: BufRead,
{
    type Item = V;

    fn next(&mut self) -> Option<Self::Item> {
        self.value()
    }
}

This could be made more efficient by using fill_buf and consume to only read one word at a time, shortening the max length of the String. It would also be sensible to report errors instead of unwrapping.

drewtato
  • 6,783
  • 1
  • 12
  • 17
  • With my edit of the question it should be clear that this does not really answer the question. But it at least answers a big part of the old question! – true equals false Apr 11 '23 at 21:47