I would like to create boost::spirit::qi::grammar for arbitrary integer. Storing integer to string just feels terrible wasting of memory especially when integer is represented in binary format. How could I use arbitrary precision integer class (e.g. GMP or llvm::APInt) in structure?
2 Answers
If you have a text file that contains a series of arbitrarily long integers then Qi could certainly be used to parse that file very efficiently into individual numbers, presented as text tokens. How you go about transforming those tokens into GMP numbers is up to you, but I suggest that the mechanism supplied by the library for entering numbers via text is more optimized than anything you are likely to come up with off the top of your head.
If you are asking whether Qi could be adapted to read a binary file containing arbitrarily long numbers, then the answer is yes - there is support for binary parsers already, see here: http://www.boost.org/doc/libs/1_48_0/libs/spirit/doc/html/spirit/qi/reference/binary.html. I am not sure of the format of your target math lib's integers by my guess is that you could link these primitives together to read binary representations of your numbers directly. Or, you could design your own parser primitive based on one of these.

- 1,077
- 11
- 19
Looks like Peter bountied the wrong question here.
I since answered his own question Parse arbitrary precision numbers with Boost spirit which did focus on LLVM's APInt type.
However in the process I - of course - showed how to use with Boost multiprecision types.
I would add the following notes to do more justice to the focus of this question:
when reading from a file, use a file-mapping and raw
char*
instead ofistream
and stream iterators. I happen to have just demonstrated the speed up of that in another answer thread here: comment link:So here's how to optimize the reading: coliru.stacked-crooked.com/a/0dcb8a05f12a08a5. Now everything in ~1.28s Looks crazy optimized
[...] Only way to shave off a few milliseconds from parsing it s/double_/float_/g but I wouldn't because it removes the genericity from the Graph model. – sehe 3 hours ago
when working with boost multiprecision types in general you will find you have to disable expression templates. Also, I think true arbitrary-precision types tend not to work well with
int_parser<>
, but the fixed precision types all did when I tried.Consider not parsing all numbers at all if you don't need. You can "lazily" parse some files, just detecting e.g. line boundaries or other structural elements. Then when required you can parse the fragment of interest in detail.
I have a very detailed answer showing this (Using boost::iostreams::mapped_file_source with std::multimap) on a memory-mapped text file where you're able to binary-search on a multi-gigabyte file without any memory overhead and then only parse the relevant area:
Live On Coliru (including generating test data)
#define NDEBUG #undef DEBUG #include <boost/iostreams/device/mapped_file.hpp> #include <boost/utility/string_ref.hpp> #include <boost/optional.hpp> #include <boost/spirit/include/qi.hpp> #include <thread> #include <iomanip> namespace io = boost::iostreams; namespace qi = boost::spirit::qi; template <typename Key, typename Value> struct text_multi_lookup { text_multi_lookup(char const* begin, char const* end) : _map_begin(begin), _map_end(end) { } private: friend struct iterator; enum : char { nl = '\n' }; using rawit = char const*; rawit _map_begin, _map_end; rawit start_of_line(rawit it) const { while (it > _map_begin) if (*--it == nl) return it+1; assert(it == _map_begin); return it; } rawit end_of_line(rawit it) const { while (it < _map_end) if (*it++ == nl) return it; assert(it == _map_end); return it; } public: struct value_type final { rawit beg, end; Key key; Value value; boost::string_ref str() const { return { beg, size_t(end-beg) }; } }; struct iterator : boost::iterator_facade<iterator, boost::string_ref, boost::bidirectional_traversal_tag, value_type> { iterator(text_multi_lookup const& d, rawit it) : _region(&d), _data { it, nullptr, Key{}, Value{} } { assert(_data.beg == _region->start_of_line(_data.beg)); } private: friend text_multi_lookup; text_multi_lookup const* _region; value_type mutable _data; void ensure_parsed() const { if (!_data.end) { assert(_data.beg == _region->start_of_line(_data.beg)); auto b = _data.beg; _data.end = _region->end_of_line(_data.beg); if (!qi::phrase_parse( b, _data.end, qi::auto_ >> qi::auto_ >> qi::eoi, qi::space, _data.key, _data.value)) { std::cerr << "Problem in: " << std::string(_data.beg, _data.end) << "at: " << std::setw(_data.end-_data.beg) << std::right << std::string(_data.beg,_data.end); assert(false); } } } static iterator mid_point(iterator const& a, iterator const& b) { assert(a._region == b._region); return { *a._region, a._region->start_of_line(a._data.beg + (b._data.beg -a._data.beg)/2) }; } public: value_type const& dereference() const { ensure_parsed(); return _data; } bool equal(iterator const& o) const { return (_region == o._region) && (_data.beg == o._data.beg); } void increment() { _data = { _region->end_of_line(_data.beg), nullptr, Key{}, Value{} }; assert(_data.beg == _region->start_of_line(_data.beg)); } }; using const_iterator = iterator; const_iterator begin() const { return { *this, _map_begin }; } const_iterator end() const { return { *this, _map_end }; } const_iterator cbegin() const { return { *this, _map_begin }; } const_iterator cend() const { return { *this, _map_end }; } template <typename CompatibleKey> const_iterator lower_bound(CompatibleKey const& key) const { auto f(begin()), l(end()); while (f!=l) { auto m = iterator::mid_point(f,l); if (m->key < key) { f = m; ++f; } else { l = m; } } return f; } template <typename CompatibleKey> const_iterator upper_bound(CompatibleKey const& key) const { return upper_bound(key, begin()); } private: template <typename CompatibleKey> const_iterator upper_bound(CompatibleKey const& key, const_iterator f) const { auto l(end()); while (f!=l) { auto m = iterator::mid_point(f,l); if (key < m->key) { l = m; } else { f = m; ++f; } } return f; } public: template <typename CompatibleKey> std::pair<const_iterator, const_iterator> equal_range(CompatibleKey const& key) const { auto lb = lower_bound(key); return { lb, upper_bound(key, lb) }; } }; #include <iostream> int main() { io::mapped_file_source map("input.txt"); text_multi_lookup<double, unsigned int> tml(map.data(), map.data() + map.size()); auto const e = tml.end(); for(auto&& line : tml) { std::cout << line.str(); auto er = tml.equal_range(line.key); if (er.first != e) std::cout << " lower: " << er.first->str(); if (er.second != e) std::cout << " upper: " << er.second->str(); } }

- 374,641
- 47
- 450
- 633
-
1I did find this one after posting my own, I should have deleted mine and bountied this one probably but this is also good. – Peter Apr 26 '21 at 19:06
-
@Peter Ah, That makes a lot of sense. And bounties are good karma. Thanks for caring! – sehe Apr 26 '21 at 20:12