0

I have a map that contains Spanish words as keys and the same word in English as the values for each key. I want to translate a string of words from Spanish to English. I'm assuming I'll need to parse the string to separate the words. I don't know how to search through the map keys and then to display the value.

    map<string, string> trans;
    tran["rearrancar"] = "reboot";
    tran["pantalla"] = "screen";
    tran["texto"] = "text";
    tran["virus"] = "virus";
    tran["tinta"] = "ink";
    tran["mitad"] = "half";
    tran["interno"] = "internal";
    tran["memoria"] = "memory";
    tran["papel"] = "paper";
    tran["energia"] = "power";
    tran["fallo"] = "bug";
    tran["pelo"] = "hair";
    tran["el"] = "the";
    tran["dos"] = "two";
    tran["todas"] = "all";
    tran["en"] = "in";
    tran["de"] = "of";
    tran["los"] = "the";
    tran["comprar"] = "buy";
    tran["tarde"] = "afternoon";
    tran["quieres"] = "want";
    tran["muchachos"] = "boys";
    tran["tienen"] = "have";
    tran["ordenador"] = "computer";
    tran["con"] = "with";
    tran["antes"] = "before";
    tran["vacio"] = "empty";
    tran["tu"] = "you";
    tran["hambre"] = "hunger";
    tran["contaminado"] = "corrupt";
    tran["a"] = "to";
    tran["una"] = "a";
    tran["la"] = "the";
    tran["cafe"] = "brown";
    tran["su"] = "your";
    tran["es"] = "is";
    tran["quiero"] = "want";
    tran["vamos"] = "go";
    tran["mi"] = "my";
    tran["barco"] = "ship";
    tran["nosotros"] = "we";
    tran["casa"] = "house";
    tran["yo"] = "I";
    tran["borrar"] = "delete";
    tran["necesita"] = "necessary";
    tran["despues"] = "after";

    string paragraph ("yo quiero una ordenador virus
    todas de los muchachos tienen interno memoria
    mi pelo es cafe
    tu quieres tinta con su papel
    rearrancar el ordenador a vacio el pantalla");

Would it be better to store each word into an array of strings?

Edit: I can now search the map for the word to translate but it crashes after the 4th translated word. I'm sure it has something to do with the parameters in my for loop but I don't know what to put in it.

    stringstream ss(paragraph);
    string word = "";
    for (int i = 0; i < paragraph.length(); i++) {
        getline(ss, word, ' ');
        cout << tran.find(word)->second << " ";
    }

Paragraph is the string containing the paragraph to be translated. Tran is the name of my map containing the Spanish keys and English values.

jgato
  • 9
  • 3
  • Keep the Spanish words in a vector. Keep the English words in a vector. Maintain a `std::map` where the key is the index of the Spanish word in the first vector and the value is the index of the English word in the other vector. You can then maintain a Spanish-to-index map. – erip May 01 '18 at 21:17
  • @erp, so two vectors and a map instead of just a map? – ChiefTwoPencils May 01 '18 at 21:19
  • @ChiefTwoPencils Yes. A `std::map` of integral types is going to index much faster. In NLP this is called positional indexing. – erip May 01 '18 at 21:20
  • You don't need to "search" the keys, the point is a map takes the key and returns the value. – ChiefTwoPencils May 01 '18 at 21:22
  • @ChiefTwoPencils Ermmmm, yes... `std::map` is a tree-based implementation, which will require search. And for a large number of strings, this will be a huge performance hit. – erip May 01 '18 at 21:23
  • I don't really understand. I was thinking that while I loop through the string, I could parse the string into it's own words I could search the keys for that word then display the value. – jgato May 01 '18 at 21:27
  • Yes. You can do: `english_vector[spanish_vector[spanish_to_index["es"]]]`. – erip May 01 '18 at 21:29
  • Ok, thank you. I'll try that now. – jgato May 01 '18 at 21:37
  • @erip, the point is ***they*** don't have to search, you can `find` elements directly like in your last comment; no? – ChiefTwoPencils May 01 '18 at 21:50
  • Ok, I kept it as a map and I can search the string for the keys and it displays the value but it crashes after translating the 4th word. – jgato May 01 '18 at 22:54
  • @erip I also don't know what you're talking about. Are you saying a linear search through a vector of strings plus a search in a map of integral types is better than just a search in a map of strings? Perhaps you're overthinking a little? The way I understand it is that you would first linear search a vector for a Spanish word, then search the map for its corresponding index in another vector, and then get the English word from that index. How is that faster than just searching a map for a string? – eesiraed May 02 '18 at 03:09
  • 1
    Indeed @FeiXiang, the key lookup to get the first index could have simply returned the corresponding word instead of an index to an index. Further, the suggested increase in efficiency for integral types is not real considering you're still doing the key lookup against a string. – ChiefTwoPencils May 02 '18 at 15:24
  • @erip Yes, you're going to need to search a BST to find a string in a map, which can be costly if the map is big. But you would rather do a linear search through an unordered vector of the same size? Think about it. Searching a map for a string takes `O(log n)` comparisons and `O(k log n)` overall if `n` is the size of the map and `k` is the length of the strings. Doing a linear search through an unordered vector takes `O(n)` comparisons and `O(kn)` overall. That is a **very** big difference. You're probably overthinking this since you're used to more complicated stuff in NLP. – eesiraed May 03 '18 at 00:46
  • @FeiXiang I'm not talking about a linear search, but I'm going to put a pin in it. – erip May 03 '18 at 01:19

1 Answers1

0

Your loop loops paragraph.length() times (the number of characters in paragraph), but each time you extract a word. See the problem?

Use while (getline(ss, word, ' ')) instead. getline will return the stream it was given, and converting it to bool is equivalent to !ss.fail(). This basically loops until an extraction fails (reached the end of the stream and did not extract anything).

You also never checked if the search for the word in the map fails and therefore have the possibility of trying to dereference tran.end().

Some other minor problems with your code includes using namespace std which can be considered bad practice, and some typos involving the difference between tran and trans.

The fixed code:

#include <iostream>
#include <map>
#include <string>
#include <sstream>

int main()
{
    std::map<std::string, std::string> tran;
    tran["rearrancar"] = "reboot";
    tran["pantalla"] = "screen";
    tran["texto"] = "text";
    tran["virus"] = "virus";
    tran["tinta"] = "ink";
    tran["mitad"] = "half";
    tran["interno"] = "internal";
    tran["memoria"] = "memory";
    tran["papel"] = "paper";
    tran["energia"] = "power";
    tran["fallo"] = "bug";
    tran["pelo"] = "hair";
    tran["el"] = "the";
    tran["dos"] = "two";
    tran["todas"] = "all";
    tran["en"] = "in";
    tran["de"] = "of";
    tran["los"] = "the";
    tran["comprar"] = "buy";
    tran["tarde"] = "afternoon";
    tran["quieres"] = "want";
    tran["muchachos"] = "boys";
    tran["tienen"] = "have";
    tran["ordenador"] = "computer";
    tran["con"] = "with";
    tran["antes"] = "before";
    tran["vacio"] = "empty";
    tran["tu"] = "you";
    tran["hambre"] = "hunger";
    tran["contaminado"] = "corrupt";
    tran["a"] = "to";
    tran["una"] = "a";
    tran["la"] = "the";
    tran["cafe"] = "brown";
    tran["su"] = "your";
    tran["es"] = "is";
    tran["quiero"] = "want";
    tran["vamos"] = "go";
    tran["mi"] = "my";
    tran["barco"] = "ship";
    tran["nosotros"] = "we";
    tran["casa"] = "house";
    tran["yo"] = "I";
    tran["borrar"] = "delete";
    tran["necesita"] = "necessary";
    tran["despues"] = "after";

    std::string paragraph(
            "yo quiero una ordenador virusu todas de los muchachos tienen");
    std::stringstream ss(paragraph);
    std::string word;
    while (std::getline(ss, word, ' '))
    {
        auto findResult = (tran.find(word));
        std::cout
                << (findResult != tran.end() ?
                        findResult->second : "[translation not found]") << " ";
    }
}
eesiraed
  • 4,626
  • 4
  • 16
  • 34