-2

I have a web application written in Go and this application makes queries to a Postgres database. When I get back my records, I am iterating over the records with rows.Next, and scanning every row to a struct with rows.Scan.

How can I make this whole process faster?

I think this program is not very efficient because with every new record to the database, the time to scan all the records will grow as well. I thought about using goroutines, but I am worried that maybe two goroutines will scan the same data. Can I prevent this by using Mutexes? But what is the point of using concurrency, if we are preventing other goroutines from accessing the data by using mutex locks?

Here is the code I am planning to improve:

func GetUsers() ([]publicUser, error) {
    query := `select user_id, first_name, last_name, registered_at from users;`
    rows, err := db.Query(query)
    if err != nil {
        return nil, err
    }

    var us []publicUser

    for rows.Next() {
        var u publicUser
        if err = rows.Scan(&u.UserId, &u.FirstName, &u.LastName, &u.RegisteredAt); err != nil {
            log.Println(err, "GetUsers")
            return nil, err
        }
        us = append(us, u)
    }
    if err := rows.Err(); err != nil {
        log.Println(err, "GetUsers2")
        return nil, err
    }
    return us, nil
}

Should I fire a new goroutine like this ?:

func GetUsers() ([]publicUser, error) {
    query := `select user_id, first_name, last_name, registered_at from users;`
    rows, err := db.Query(query)
    if err != nil {
        return nil, err
    }

    var us []publicUser

    for rows.Next() {
        go func() {
             var u publicUser
             if err = rows.Scan(&u.UserId, &u.FirstName, &u.LastName, &u.RegisteredAt); err != nil 
             {
                 log.Println(err, "GetUsers")
                 return nil, err
             }
             us = append(us, u)
       }()
    }
    if err := rows.Err(); err != nil {
        log.Println(err, "GetUsers2")
        return nil, err
    }
    return us, nil
}
Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
Abidin
  • 1
  • 1
  • 2
    Are you sure the process is too slow for you? The limiting factor will be the database here, not the iterating code on the Go side. I recommend profiling before you try to complicate your code with optimizations. – Eli Bendersky Jun 15 '21 at 15:57
  • No matter what you do, the db will be the bottleneck. You can run a benchmark with a simulated db loop and see – Burak Serdar Jun 15 '21 at 15:58
  • `rows` is not safe for concurrent use. `rows.Scan` converts the raw data from the database to Go types. The conversion is probably not what limits performance. – Charlie Tumahai Jun 15 '21 at 15:59
  • 2
    "...is not very efficient because with every new record to the database, the time to scan all the records will grow as well." -- that's correct if you want to read the WHOLE table every time. Do you really need to do that? Applications rarely need to do that, except for reporting batch jobs. I fail to see the use case. – The Impaler Jun 15 '21 at 16:07
  • @EliBendersky It is absouletly not slow. This isn't even a production grade app. I am just trying to get good at programming. I am trying to learn programming and I am not even a computer science student. I will do a research about profiling. Thank you for your response, I appreciate it. I will not complicate the app for minor optimizations. – Abidin Jun 15 '21 at 16:13
  • @CeriseLimón the conversion is not the thing I am worried about. I am worried that after some time when the db grows, the app will slow down because we will have to iteratively scan ALL of the records. Thank you for your response. – Abidin Jun 15 '21 at 16:15
  • @Abidin My point is that it's not useful to fire off goroutines as shown in the second example because it does nothing to reduce the cost of reading the entire table. Reducing the time to read all records requires knowledge and tuning of lower-level database features. – Charlie Tumahai Jun 15 '21 at 18:01

1 Answers1

1

People generally solve this problem by using pagination. The basic idea is that a client can request n number of records at a time, and every subsequent request returns n records offset by n * i + 1 (where i is the loop iteration that you're currently on).

If for example you're displaying this user list via a frontend GUI, you would probably want to have a table that shows 50 results at a time rather than all the users in your entire database, as that would slow down your frontend. Whenever the user clicks to see the next page, you would make a new request to the server asking for 50 results, offset by 50 (since we're now requesting page 2). This approach addresses the true bottleneck in this scenario which is your database, not your server code. See this resource for more information.

Listable collections should support pagination, even if results are typically small.

Also see this SO answer for an example of paging the results from a SQL query.

Clark McCauley
  • 1,342
  • 5
  • 16