0

As stated in the title I will showcase below a minimal code for my problem, I am totally new when it comes to <fstream> and <string.h>.

Here is the text file (I will only post a small part of it since it's huge)

Tokyo   Japan   37977000
Jakarta Indonesia   34540000
Delhi   India   29617000
Mumbai  India   23355000
Manila  Philippines 23088000
Shanghai    China   22120000
Sao Paulo   Brazil  22046000
Seoul   "Korea, South"  21794000
Mexico City Mexico  20996000
Guangzhou   China   20902000
Beijing China   19433000
Cairo   Egypt   19372000
New York    United States   18713220
Kolkata India   17560000
Moscow  Russia  17125000
Bangkok Thailand    17066000
Buenos Aires    Argentina   16157000
Shenzhen    China   15929000
Dhaka   Bangladesh  15443000
Lagos   Nigeria 15279000
Istanbul    Turkey  15154000
Osaka   Japan   14977000
Karachi Pakistan    14835000
Bangalore   India   13707000
Tehran  Iran    13633000
Kinshasa    Congo (Kinshasa)    13528000
Ho Chi Minh City    Vietnam 13312000
Los Angeles United States   12750807
Rio de Janeiro  Brazil  12272000
Nanyang China   12010000
Baoding China   11860000
Chennai India   11324000

Here is my small sample program for opening and reading the file and then storing respectively citycountrypopulation.

#include <iostream>
#include <fstream>
#include <string>
#include <string.h>

using namespace std;

int main()
{
    string nameCity1, Country, Population;
    ifstream file("file.txt");
    if(file.is_open())
    {
        while(file >> nameCity1 >> Country >> Population)
        {
                int pos = nameCity1.find('\t');
                string r = nameCity1.substr(0, pos);
                int pos2 = Country.find('\t' + 2);
                string r2 = Country.substr(0, pos2);
                int pos3 = Population.find('\t' + 3);
                string r3 = Population.substr(0, pos3);
                cout << r << "\t" << r2 << "\t" << r3 << "\n";  
        }
        file.close();
    }
    else
    {
        cout << "file is not open" << "\n";
    }
    cin.get();
    return 0;
}

I am pretty sure it's some dumb mistake of mine so please let me know for any suggestions, fixes. Here is what I get when I compile:

Tokyo   Japan   37977000
Jakarta Indonesia       34540000
Delhi   India   29617000
Mumbai  India   23355000
Manila  Philippines     23088000
Shanghai        China   22120000
Sao     Paulo   Brazil
22046000        Seoul   "Korea,
South"  21794000        Mexico
City    Mexico  20996000
Guangzhou       China   20902000
Beijing China   19433000
Cairo   Egypt   19372000
New     York    United
States  18713220        Kolkata
India   17560000        Moscow
Russia  17125000        Bangkok
Thailand        17066000        Buenos
Aires   Argentina       16157000
Shenzhen        China   15929000
Dhaka   Bangladesh      15443000
Lagos   Nigeria 15279000
Istanbul        Turkey  15154000
Osaka   Japan   14977000
Karachi Pakistan        14835000
Bangalore       India   13707000
Tehran  Iran    13633000
Kinshasa        Congo   (Kinshasa)
13528000        Ho      Chi
Minh    City    Vietnam
13312000        Los     Angeles
United  States  12750807
Rio     de      Janeiro
Brazil  12272000        Nanyang
China   12010000        Baoding
China   11860000        Chennai
India   11324000        Chengdu
China   11309000        Lahore
Pakistan        11021000        Paris
France  11020000        London
United  Kingdom 10979000
Linyi   China   10820000
Tianjin China   10800000
Shijiazhuang    China   10784600
Zhoukou China   9901000
Lima    Peru    9848000
Hyderabad       India   9746000
Handan  China   9549700
Bogota  Colombia        9464000
Weifang China   9373000
Nagoya  Japan   9113000
Wuhan   China   8962000
Heze    China   8750000
Ganzhou China   8677600
Tongshan        China   8669000
Chicago United  States
8604203 Luanda  Angola
8417000 Changsha        China
8394500 Fuyang  China
8360000 Kuala   Lumpur
Malaysia        8285000 Jining
China   8023000 Dongguan
China   7981000 Jinan
China   7967400 Foshan
China   7905700 Hanoi
Vietnam 7785000 Pune
India   7764000 Chongqing
China   7739000 Changchun
China   7674439 Zhumadian
China   7640000 Ningbo
China   7639000 Cangzhou
China   7544300 Nanjing
China   7496000 Hefei
China   7457027 Ahmedabad
India   7410000 Hong
Kong    Hong    Kong
7347000 Zhanjiang       China
7332000 Shaoyang        China
7302400 Hengyang        China
7300600 Khartoum        Sudan
7282000 Nantong China
7282835 Yancheng        China
7260240 Nanning China
7153300 Xi'an   China
7135000 Shenyang        China
7105000 Tangshan        China
7100000 Santiago        Chile
7007000 Zhengzhou       China
7005000 Shangqiu        China
7000000 Yantai  China
6968202 Riyadh  Saudi
Arabia  6881000 Dar
es      Salaam  Tanzania
6698000 Xinyang China
6634000 Shangrao        China
6579714 Luoyang China
6549941 Bijie   China
6537498 Quanzhou        China
6480000 Hangzhou        China
6446000 Miami   United
States  6445545 Huanggang
China   6333000 Maoming
China   6313200 Kunming
China   6250000 Nanchong
China   6183000 Zunyi
China   6127009 Jieyang
China   6089400 Lu'an
China   6090000 Yichun
China   6048700 Madrid
Spain   6026000 Changde
China   6011000 Taizhou
China   5968838 Liaocheng
China   5955300 Qujing
China   5855055 Surat
  • `>>` into a `std::string` skips all leading whitespace and then stores only up to the next whitepace. You need a smarter method to detect quoted strings BEFORE the tokens are all split up on whitespcace. – user4581301 May 19 '22 at 17:41
  • `file >> nameCity1` reads the stream until the next separator; space is a separator. Use `std::getline` and then split the whole line. – fabian May 19 '22 at 17:42
  • That file format is really bad for this. How are you going to succeed in splitting up `Los Angeles United States 12750807` properly? – Ted Lyngmo May 19 '22 at 17:45
  • 1
    you should read each line using `std::getline` then parse the string – pm100 May 19 '22 at 17:45
  • https://stackoverflow.com/questions/40337177/c-how-to-use-fstream-to-read-tab-delimited-file-with-spaces – pm100 May 19 '22 at 17:47
  • @TedLyngmo "*How are you going to succeed in splitting up `Los Angeles United States 12750807` properly?*" - quite easily, since the values are actually tab-separated: `Los Angeles[tab]United States[tab]12750807` – Remy Lebeau May 19 '22 at 18:01
  • @RemyLebeau Ah... yes that wasn't mentioned in the question - but it's obvious when looking closer at the code. Thanks. – Ted Lyngmo May 19 '22 at 18:06
  • @TedLyngmo it was obvious once I looked at the original data behind the post. Web browsers display tabs as spaces, but the tabs are in the data. – Remy Lebeau May 19 '22 at 18:09
  • 1
    @RemyLebeau Hacking the post to get the details is cheating. :-) – Ted Lyngmo May 19 '22 at 18:10
  • 1
    @TedLyngmo it is not hacking when I have permissions to edit posts :-) But you are right, this is a detail that should have been stated in the post. – Remy Lebeau May 19 '22 at 18:12

1 Answers1

2

When operator>> is reading a std::string, it stops reading on any whitespace, which including spaces, tabs, line breaks, etc. As such, file >> nameCity1 will not work for cities like Sao Paulo, Mexico City, New York, etc. And file >> Country will not work for United States, Congo (Kinshasa), etc.

Since your data is line-oriented, I would suggest using std::getline() to read each line separately, using std::istringstream to parse each line.

And, the data you have presented is actually tab-delimited on each line (which can be difficult to see when a web browser displays tabs as spaces, but I can see the tabs in the source data you posted). You can use std::getline() with '\t' as its delimiter to separate the values in each line, eg:

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>

using namespace std;

int main()
{
    ifstream file("file.txt");
    if (file.is_open())
    {
        string line;
        while (getline(file, line))
        {
            istringstream iss(line);
            string City, Country, Population;

            if (getline(iss, City, '\t') && getline(iss, Country, '\t') && iss >> Population)
                cout << City << '\t' << Country << '\t' << Population << '\n';  
        }

        file.close();
    }
    else
    {
        cout << "file is not open" << "\n";
    }

    cin.get();
    return 0;
}
Remy Lebeau
  • 555,201
  • 31
  • 458
  • 770