17

I'm attempting to learn Rust. And a recent problem I've encountered is the following: given a String, that is exactly some multiple of n, I want to split the string into chunks of size n, and insert a space in between these chunks, then collect back into a single string.

The issue I was running into, is that the chars() method returns the Chars struct, which for some reason doesn't implement the SliceConcatExt trait, so chunks() can't be called on it.

Furthermore, once I've successfully created a Chunks struct (by calling .bytes() instead) I'm unsure how to call a .join(' ') since the elements are now Chunks of byte slices...

There has to be an elegant way to do this I'm missing.

For example here is an input / output that illustrates the situation:

given: whatupmyname, 4
output: what upmy name

This is my poorly written attempt:

let n = 4;
let text = "whatupmyname".into_string();
text.chars()
    // compiler error on chunks() call
    .chunks(n)
    .collect::<Vec<String>>()
    .join(' ')
Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77
Zeke
  • 333
  • 1
  • 3
  • 7
  • I don't know what you're referring to as the "size" of a string. Into how many chunks can you split `é` (one codepoint, two bytes)? What about `e̊` (two codepoints, three bytes)? What about `` (two codepoints, eight bytes)? – trent Jul 15 '19 at 03:37
  • Possible duplicate of [Creating a sliding window iterator of slices of chars from a String](https://stackoverflow.com/questions/51257304/creating-a-sliding-window-iterator-of-slices-of-chars-from-a-string) – hellow Jul 15 '19 at 05:32
  • @trentcl that's fair, I guess I should specify in that this case I'm only worried about char's - characters that can be represented by those 128 bytes. Which is more limited, but simple enough for my purposes. – Zeke Jul 16 '19 at 23:58
  • @hellow this is indeed very close to creating a sliding window over a string, but I think this case varies because I'm trying to create chunks instead and then collect these chunks into strings. I'm encountering issues on both sides: the chunking is indeed possible once the String has been converted into a Vec, but the collecting is still troublesome. – Zeke Jul 17 '19 at 00:09

4 Answers4

24

The problem here is that chars() and bytes() return Iterators, not slices. You could use as_bytes(), which will give you a &[u8]. However, you cannot directly get a &[char] from a &str, because there only exists the bytes themselves, and the chars must be created by looking through and seeing how many bytes makes up each one. You'd have to do something like this:

text.chars()
    .collect::<Vec<char>>()
    .chunks(n)
    .map(|c| c.iter().collect::<String>())
    .collect::<Vec<String>>()
    .join(" ");

However, I would NOT recommend this as it has to allocate a lot of temporary storage for Vecs and Strings along the way. Instead, you could do something like this, which only has to allocate to create the final String.

text.chars()
    .enumerate()
    .flat_map(|(i, c)| {
        if i != 0 && i % n == 0 {
            Some(' ')
        } else {
            None
        }
        .into_iter()
        .chain(std::iter::once(c))
    })
    .collect::<String>()

This stays as iterators until the last collect, by flat_mapping with an iterator that is either just the character or a space and then the character.

JayDepp
  • 1,085
  • 10
  • 10
  • I had tried the first suggestion prior to posting (although I felt uneasy about the making unnecessary Vec's) but I had encountered a compiler error on the `collect::>()` call. Something stating that a `Vec` couldn't be built from a `Iterator<&[char]>` which kind of struck me as odd. EDIT: running it now, it states that the trait `FromIterator<&[char]>` is not implemented for `Vec` so I would think that maybe I can implement that trait? – Zeke Jul 17 '19 at 00:02
  • Fixed it. I wouldn't recommend using that code though, it has to allocate way too much. BTW, you wouldn't be able to implement that trait because the trait and types involved are both not "yours". – JayDepp Jul 17 '19 at 00:18
  • Oh I see. I thought there would be a way to implicitly convert the Chunks of chars into strings, but a map would do that for you. Also, the flat_map concept is a little foreign to me but I'm going to attempt to deconstruct it: - the flat_map normally flattens nested structures, but in this case it's used to return an iterator - if you're on the nth character, insert an iterator that wraps a space, and chain it into the current iterator so that it comes before it. Else, None will be turned into an iterator which will produce nothing. - collect the iterators into a string – Zeke Jul 17 '19 at 01:28
3

So if you want to work from a list of chars to create a String, you can use fold for that.

Something like this :

text.chars
    .enumerate()
    .fold(String::new(), |acc, (i, c)| {
        if i != 0 && i == n {
            format!("{} {}", acc, c)
        } else {
            format!("{}{}", acc, c)
        }
    })
sterfield
  • 71
  • 4
  • 1
    Oh interesting, I like this solution as well and appreciate the time you took to respond. This very much a clear and understandable solution that I should've thought about. Thank you! I'm curious if there's string allocation overhead compared to the flat_map solution JayDepp posted. – Zeke Aug 04 '19 at 16:01
  • 1
    So, unfortunately, there is. `format!` is creating a `String` and returning it. So for each character, a new `String` is created, containing the previous one with the current character and an optional space if needed. In the end, you'll ended up with the same `String` but there is quite some overhead because of the multiple intermediate `String`. The above method is better, because you'll have a `Iterator>` and the `flat_map` will create an `Iterator` that is ready to be `collect`ed at the very end. So you'll have only one final `String` created. – sterfield Aug 05 '19 at 18:20
2

If the size of the data you want to split in is fixed then:

use std::str;

fn main() {
    let subs = "&#8204;&#8203;&#8204;&#8203;&#8204;&#8203;&#8203;&#8204;&#8203;&#8204;".as_bytes()
        .chunks(7)
        .map(str::from_utf8)
        .collect::<Result<Vec<&str>, _>>()
        .unwrap();
        
    println!("{:?}", subs);
}

// >> ["&#8204;", "&#8203;", "&#8204;", "&#8203;", "&#8204;", "&#8203;", "&#8203;", "&#8204;", "&#8203;", "&#8204;"]
Esteban Borai
  • 2,311
  • 1
  • 21
  • 27
2

Such a simple task can be solved with a single loop:

fn main() {
    let n = 4;
    let text = "whatupmyname";
    let mut result = String::new();

    for (i, c) in text.chars().enumerate() {
        result.push(c);
        if (i + 1) % n == 0 {
            result.push(' ');
        }
    }
    println!("{:?}", result); // "what upmy name "
}
hkBst
  • 2,818
  • 10
  • 29
Kaplan
  • 2,572
  • 13
  • 14