Fastest way to retrieve data from database

Question

I am working on a ASP.NET project with C# and Sql Server 2008.

I have three tables:

Users DataFields DataField Values

Each user has a specific value for each data field, and this value is stored in the DataFieldsValues.

Now I want to display a report that looks like this:

enter image description here

I have created the objects User, and DataField. In the DataField object, there is the Method string GetValue(User user), in which I get the value of a field for a certain user.

Then I have the list of Users List<User> users and the list of DataFields List<DataField> fields and I do the following:

string html = string.Empty;
html += "<table>";
html += "<tr><th>Username</th>";
foreach (DataField f in fields)
{
   html += "<th>" + f.Name + "</th>";
}
html += "</tr>"

foreach (User u in users)
{
   html += "<tr><td>" + u.Username + "</td>"
   foreach (DataField f in fields)
   {
      html += "<td>" + f.GetValue(u) + "</td>";
   }
   html += "</tr>"
}
Response.Write(html);

This works fine, but it is extremely slow, and I am talking about 20 users and 10 data fields. Is there any better way in terms of performance to achieve this?

EDIT: For each parameter inside the classes, I retrieve the value using the following method:

public static string GetDataFromDB(string query)
{
    string return_value = string.Empty;
    SqlConnection sql_conn;
    sql_conn = new SqlConnection(ConfigurationManager.ConnectionStrings["XXXX"].ToString());
    sql_conn.Open();
    SqlCommand com = new SqlCommand(query, sql_conn);
    //if (com.ExecuteScalar() != null)
    try
    {
        return_value = com.ExecuteScalar().ToString();
    }
    catch (Exception x)
    {
    }
    sql_conn.Close();
    return return_value;
}

For instance:

public User(int _Id)
{
this.Id = _Id
this.Username = DBAccess.GetDataFromDB("select Username from Users where Id=" + this.Id)
 //...
}

why don't you use list or data view to display data. you don't have to write for loops and td for it — Jack Gajanan, Dec 05 '13 at 11:47
i hope your not opening/closing your connections inside the loops, because that will drastically slowdown the process of data-retrieval — Mana, Dec 05 '13 at 11:47
@enb081 One thing you can do is use `StringBuilder` in place of string. — Suraj Singh, Dec 05 '13 at 12:02
@enb081 Why not use sql joins to get all the data in one query? — scheien, Dec 05 '13 at 12:12
How can I use joins in one query with my database structure? In that case, I should be using cursors, which will make it even slower... — enb081, Dec 05 '13 at 12:50
I agree with Scheien. It looks like you can do this with one query fairly simply. Select * from Users left join DataFieldValues left join DataFields. — Keith, Dec 10 '13 at 19:44
There are many great comments and suggestions on this page. But if you are interested in getting your code to perform quickly while keeping your schema and your code mostly intact, then take a look at my answer. Of course there are many other things you can do to improve your schema and such, but they're a bit more involved and will take more time to understand (though there's nothing wrong with learning new things). — Chris Leyva, Dec 13 '13 at 14:29
It's also worth noting that on top of everything else, the way that you are composing your strings is inherently slow (`O(n^2)`). You would be better off using the StringBuilder class instead. -- Oops, Chris Lava already got this one. — RBarryYoung, Dec 14 '13 at 15:49
A note for enb081. I am not disparaging app developers vs db developers. Based on your not liking EF and non-response to SQL solution. I am guessing that you are an app dev . EF is Microsoft (among others) method to give app develops the simplification and optimization that db developers can do. There are lot of things that db developers suggest that would benefit you (even though their solutions sounds foreign). — Robert Co, Dec 14 '13 at 15:57

score 23 · Accepted Answer · edited Jun 20 '20 at 09:12

23

Here are 2 suggestions that will help. The first suggestion is what will improve your performance significantly. The second suggestion will help also, though probably not make your app faster in your case.

Suggestion 1

You call the method GetDataFromDB(string query) very often. This is bad because you create a new SqlConnection and SqlCommand each time. This takes time and resources. Also, if there is any network delay, that is multiplied by the number of calls you are making. So it's just a bad idea.

I suggest that you call that method once and have it populate a collection like a Dictionary<int, string> so that you can quickly look up your Username value from the user id key.

Like this:

// In the DataField class, have this code.
// This method will query the database for all usernames and user ids and
// return a Dictionary<int, string> where the key is the Id and the value is the 
// username. Make this a global variable within the DataField class.
Dictionary<int, string> usernameDict = GetDataFromDB("select id, username from Users");

// Then in the GetValue(int userId) method, do this:
public string GetValue(int userId)
{
    // Add some error handling and whatnot. 
    // And a better name for this method is GetUsername(int userId)
    return this.usernameDict[userId];
}

Suggestion 2

Here is another way that you can improve things, though slightly in this case—use the StringBuilder class. There are significant performance gains (here is an overview: http://support.microsoft.com/kb/306822).

SringBuilder sb = new StringBuilder();
sb.Append("<table><tr><th>Username</th>");
foreach (DataField f in fields)
{
    sb.Append("<th>" + f.Name + "</th>");
}

// Then, when you need the string
string html = sb.ToString();

Let me know if you need some more clarification, but what you are asking for is very do-able. We can work this out!

If you make these 2 simple changes, you will have great performance. I guarantee it.

edited Jun 20 '20 at 09:12

Community

1
1

answered Dec 13 '13 at 13:48

Chris Leyva

3,478
1
26
47

2

I opened only one SqlConnection with database and saved in session. This dramatically improved performance. StringBuilder also improved it. This answer has both of the suggestions although they are all mentioned in the other answers too. Thank you very much for all your valuable advice! – enb081 Dec 14 '13 at 17:07
@enb081 - Putting an open database connection in Session is a bad idea. The SQL Server database driver in .net has built in connection pooling. By storing open connections in session your reimplementing this feature in a much less efficient and effective way. I think what the other posters were trying to say is that you need to reduce the number of queries you perform against the database and not the number of open connections. – Aaron Carlson Dec 18 '13 at 14:31
@AaronCarlson Why is putting an open database connection in Session a bad idea? – enb081 Dec 19 '13 at 07:49
@enb081 - First, session is per user. Putting an open connection in session per user means that those connections will never be returned to the connection pool. Database connections are not serializable and thus you will only be able to support inproc session state which will limit your applications ability to run in a server farm. I would first recomend you read about connection [pooling](http://msdn.microsoft.com/en-us/library/8xx3tyca(v=vs.110).aspx). – Aaron Carlson Dec 19 '13 at 14:48
For the record, my answer never suggested saving the open database connection. I suggested that you just call the database once to get all the data you would need and to then store that in a dictionary that you could use to look up your usernames and such. As opposed to opening a database connection and querying it every time you needed the username. – Chris Leyva Dec 19 '13 at 14:51
@ChrisLava - It would be helpful if you updated your answer to not include opening the connection as part of the performance problem. I believe the way you are lumping opening the database connection and querying together as the performance problem is _a_ reason why the OP went down the route of storing an open connection in session state. – Aaron Carlson Dec 19 '13 at 15:15
@AaronCarlson, Opening the connection many times is part of the problem though. Which is why the OP saw improvements when they stored the one connection and kept using that. I think the 2nd paragraph in "Suggestion 1" is very clear and does not imply storing an open connection. – Chris Leyva Dec 19 '13 at 15:25
@ChrisLava - Reading [this](http://programmers.stackexchange.com/questions/142065/creating-database-connections-do-it-once-or-for-each-query) may help you understand where I am comming from. It was clear to me that you were not suggesting storing an open connection. However, I can see how the OP came to that conclusion based on how you worded your answer. I am only trying to clear up a common misconception about the performance overhead of opening a Sql Database Connection when connection pooling is enabled. I feel that it would be a good improvement to your correct answer. – Aaron Carlson Dec 19 '13 at 17:42

score 19 · Answer 2 · answered Dec 09 '13 at 10:05

The database design you choose is named Entity-Attribute-Value, a design that is well known for its performance problems. SQL Server team has release a whitepaper for guidance around EAV design, see Best Practices for Semantic Data Modeling for Performance and Scalability.

Alas, you already have the design in place and what can you do about it now? The important thing is to reduce the miriad calls to the dB to one single call, and execute one single set oriented statement to retrieve the data. The name of the game is Table Valued Parameters:

declare @users as UsersType;

insert into @users (UserId) values (7), (42), (89);

select ut.Id, 
  ut.Username, 
  df.Name as DataFieldName, 
  dfv.Value
from Users ut
join @users up on ut.Id = up.UserId
join DataFieldValues dfv on ut.Id = dfv.UserId
join DataFields df on dfv.DataFieldId = df.Id
order by ut.Id;

For a full example, see this SqlFiddle.

While, strictly speaking, it is possible to retrieve a result on the shape you desire (data field names transposed as column names) using the PIVOT operator, I would very strongly advise against doing so. PIVOT on its own is a performance quagmire, but when you add the dynamic nature of the desired result set is basically impossible to pull it off. The traditional result set consisting of one-row-per attribute is trivial to parse into a table, because the required order by user Id guarantees a clean break between sets of correlated attributes.

Kaspars Ozols · Answer 3 · 2015-09-17T12:08:56.567

14

This is slow because under the hood you are making 20 x 10 = 200 queries to the database. Correct way would be to load everything in one turn.

You should post some details about the way you load data. If you are using Entity Framework, you should use something called Eager Loading using Include command.

// Load all blogs and related posts
var blogs1 = context.Blogs
                      .Include(b => b.Posts)
                      .ToList();

Some samples can be found here: http://msdn.microsoft.com/en-us/data/jj574232.aspx

EDIT:

It seems that you are not using the tools .NET Framework gives you. These days you don't have to do your own database access for simple scenarious like yours. Also, you should avoid concatenating string HTML like you do.

I would suggest you to redesign your application using existing ASP.NET controls and Entity Framework.

Here is a sample with step by step instructions for you: http://www.codeproject.com/Articles/363040/An-Introduction-to-Entity-Framework-for-Absolute-B

edited Sep 17 '15 at 12:08

answered Dec 05 '13 at 11:54

Kaspars Ozols

6,967
1
20
33

I don't want to use the existing ASP.NET controls because I want to be flexible with the html and shape it the way I need It. – enb081 Dec 05 '13 at 13:51
1

Also, please focus on my case. I have not one simple query to put it in a SQLDataSource control. – enb081 Dec 05 '13 at 13:54
5

I am saying that you would not have this problem at all if you were using .NET the way it is meant to be used. Best way to optimize performance in your case is to reduce DB calls. One way is to start using Entity Framework which would simplify data access a lot. – Kaspars Ozols Dec 05 '13 at 14:40
Unless you're going to make use of the contents of `blogs1` more than once, that call to `.ToList()` is just a waste of time and memory. – Jon Hanna Dec 11 '13 at 00:48
Well, it depends on what you do with the result set. The post was about performance problems because of multiple DB queries being executed, so in my sample I included call to `.ToList()` just to ensure that all the data is read in one turn using single query. – Kaspars Ozols Dec 11 '13 at 08:19
I agree with your answer, but honestly, I completely disagree on the comment about not having this problem at all if you are using Entity Framework. You can make exactly the same mistake when using Entity Framework, specially when you don't know what you are doing. – eglasius Dec 13 '13 at 23:35

score 6 · Answer 4 · edited May 23 '17 at 11:54

As Remus Rusanu said, you can get the data you want in the format you require by using the PIVOT relational operator, as far as performance of PIVOT is concerned, I've found that it will depend on the indexing of your tables and the variability and size of the data set. I would be greatly interested in hearing more from him about his opinion of PIVOTs as we are all here to learn. There is a great discussion on PIVOT vs JOINS here.

If the DataFields table is a static set then you may not need to worry about generating the SQL dynamically and you can build yourself a stored procedure; if it does vary you may need to take the performance hit of dynamic SQL(here is an excellent article on this) or use a different approach.

Unless you have further need for the data try to keep the returned set to the minimum you need for display it's a good way to reduce overhead as everything will need to go over the network unless your db is on the same physical server as the web server.

Make sure that you perform as few separate data calls as possible will reduce that time you spend raising and dropping connections.

You should always double-check of data calls within a loop when the control for the loop is based on a (probably related?) data set as this screams JOIN.

When you are experimenting with your SQL try to become familiar with execution plans these will help you figure out why you have slow running queries check out these resources for more info.

Whatever you approach you decide you need to figure out where the bottlenecks are in your code, something as basic as stepping through the execution can help with this as it will allow you to see for yourself where problems lie, this will also allow you to identify for yourself possible problems with your approach and build good design choice habits.

Marc Gravel has some interesting points to make about c# data reading here the article is a bit old but worth a read.

PIVOTing your data.(Sorry Remus ;-) ) Bases on the data example you have provided, the following code will get what you need with no in-query recursion:

--Test Data
DECLARE @Users AS TABLE ( Id int
                        , Username VARCHAR(50)
                        , Name VARCHAR(50)
                        , Email VARCHAR(50)
                        , [Role] INT --Avoid reserved words for column names.
                        , Active INT --If this is only ever going to be 0 or 1 it should be a bit.
                        );

DECLARE @DataFields AS TABLE ( Id int
                        , Name VARCHAR(50)
                        , [Type] INT --Avoid reserved words for column names.
                        );

DECLARE @DataFieldsValues AS TABLE ( Id int
                        , UserId int
                        , DataFieldId int
                        , Value VARCHAR(50)
                        );

INSERT INTO @users  ( Id
                    , Username
                    , Name
                    , Email
                    , [Role]
                    , Active) 
VALUES (1,'enb081','enb081','enb081@mack.com',2,1),
       (2,'Mack','Mack','mack@mack.com',1,1),
       (3,'Bob','Bobby','bob@mack.com',1,0)


INSERT INTO @DataFields  
                    ( Id
                    , Name
                    , [Type]) 
VALUES (1,'DataField1',3),
       (2,'DataField2',1),
       (3,'DataField3',2),
       (4,'DataField4',0)

INSERT INTO @DataFieldsValues  
                    ( Id
                    , UserId
                    , DataFieldId
                    , Value) 
VALUES (1,1,1,'value11'),
       (2,1,2,'value12'),
       (3,1,3,'value13'),
       (4,1,4,'value14'),
       (5,2,1,'value21'),
       (6,2,2,'value22'),
       (7,2,3,'value23'),
       (8,2,4,'value24')

--Query
SELECT *
FROM
(   SELECT  ut.Username, 
            df.Name as DataFieldName, 
            dfv.Value
    FROM @Users ut
    INNER JOIN @DataFieldsValues dfv 
        ON ut.Id = dfv.UserId
    INNER JOIN @DataFields df 
        ON dfv.DataFieldId = df.Id) src
PIVOT
(   MIN(Value) FOR DataFieldName IN (DataField1, DataField2, DataField3, DataField4)) pvt

--Results
Username    DataField1  DataField2  DataField3  DataField4
enb081      value11     value12     value13     value14
Mack        value21     value22     value23     value24

The most important thing to remember is to try things out for yourself as whatever we suggest might be altered by factors at your site that we aren't aware of.

+1 from me. If OP does end up using PIVOT, despite the dynamic SQL required and variable (='unknown at compile time') shape of result, then this is the way :) — Remus Rusanu, Dec 10 '13 at 07:48

antao · Answer 5 · 2013-12-05T12:06:24.877

5

How are you accessing the database? Check the generated SQL from those queries with the Profiler, if you are using EF, for example. Don't make connection every time in the foreach loop.

I would not build the html on the server side as well. Just return the object for a page datasource control.

edited Dec 05 '13 at 12:06

answered Dec 05 '13 at 11:52

antao

767
1
9
24

score 5 · Answer 6 · answered Dec 05 '13 at 11:58

5

Make sure that you are not making a connection to the database for each loop.

As I can see, the f.GetValue(u) part is a method that returns a string value that was fetched from the database.

Put the data in an object once and for all and do the same thing as f.GetValue(u) is doing here.

answered Dec 05 '13 at 11:58

evilom

573
6
19

1

@enb081 for example you have 20 loops that uses the f.GetValue(u)...each loop will connect to the database to fetch some data. The steps done by your app to fetch data from the database will take a lot more time rather than getting the data from a locally stored list of objects. – evilom Dec 06 '13 at 08:27
1

@enb081 This improves performance because creating new SqlConnections adds more time and memory to your app. Especially if there is some network delay, that will make things worse. If you take a look at my answer, I put the result of the query in a Dictionary which will let you look up the Usernames very quickly. Hope it helps. – Chris Leyva Dec 13 '13 at 15:00

score 4 · Answer 7 · answered Dec 09 '13 at 10:08

4

Use Indexed for the primary key field of the table and in the code behind use string builder.

answered Dec 09 '13 at 10:08

Amarnath Balasubramanian

9,300
8
34
62

eglasius · Answer 8 · 2013-12-13T23:29:11.397

Worst problem there: tons of round trips to the database. Each time you get a value a request goes over the network and it waits for the result.

If you must have the user list in code first, then make sure that:

You retrieve all the info in the user list in a single db call. If you have a set of user IDs, send you can send it with a table valued parameter.
If the above didn't include the field values, send the list of user IDs and the list of field IDs in 2 table valued parameters to retrieve it all in one go.

That should make a huge difference. With those 2 specific queries you have taken the network noise out of the way and can focus of improving the indexes if necessary.

Another gain you will get is on the whole concatenating strings. First step is to replace with a StringBuilder. The next step is to write to the output stream directly, so you don't need to hold all that data in memory ... but it is unlikely you will need that; and if you do due to too much data you will have trouble with browsers handling that anyway.

ps. not the OP scenario, but for those needing speed in bulk, you want bulk export instead: http://technet.microsoft.com/en-us/library/ms175937.aspx

score 2 · Answer 9 · edited Oct 09 '16 at 10:44

2

FAST... USE

stored procedures

use reader

SqlDataReader dbReader = mySqlCommand.ExecuteReader();

//if reader has row values
if (dbReader.HasRows) // while(xxx) for more rows return
{
     //READ DATA
}

DO PROPER INDEXES if need go for partitions...
Use and HINTs for SELECT NOLOCK work for me

Query Hints (Transact-SQL) http://technet.microsoft.com/en-us/library/ms181714.aspx

Locking Hints http://technet.microsoft.com/en-us/library/aa213026(v=sql.80).aspx

Yeah the only time I will use LINQ will be if I call a stored procedure.

Search LINQ to SQL

BUT I AM OLD SCHOOL....

This Entity Framework I get of rid of them since Entity Framework 1.0 is good when you do school project...

But is very expensive as compute instance...

READ ALL IN THE MEMORY DO SOMETHING???? WHY I AM PAYING FOR SQL? USE some JSON file structure then....

edited Oct 09 '16 at 10:44

marc_s

732,580
175
1,330
1,459

answered Dec 06 '13 at 21:00

Valentin Petkov

1,570
18
23

2

Your statement about Entity Framework is wrong. You don't need to read everything in memory just to access one column values. EF is smart enough to load just bare minimum of data, just use correct LINQ statement which limits returned data: `var relatedPostIds = context.Blogs.SelectMany(b => b.Posts).Select(p => p.ID).ToList();` – Kaspars Ozols Dec 07 '13 at 11:59
I simulate requests with EF over my company DB five years ago with my team we try our biggest table was ~970M rows 17s to extract 10 rows old school 0s I have 10 years experience in analytic company. A year ago come Senior .NET developer who was so ambitious to teach as the NEW EF and He will rebuild all the code with the NEW EF ask him but when they waste 3 programmers time for 4 months they did answer why they FAIL....They couldn't run even one of the queries which I do....well all my live I use EF approach. but for small thinks – Valentin Petkov Dec 08 '13 at 01:33
The only what Microsoft care is how to teach people to use EF and consume more Compute instances in Azure. And when you analyze Azure price structure you will see which part is most expensive. Then you start design your applications to consume less resources if you are paying. Eh if someone else paying EF is easy WIN!!!! – Valentin Petkov Dec 08 '13 at 01:40
1

Last two comments where more like a rant than anything argumented and useful. I don't say EF is magic bullet, but it makes software developement a lot easier in most scenarious. You could run into performance problems if you work with really huge DB but for most cases EF works just fine. – Kaspars Ozols Dec 08 '13 at 21:17
do you have comment for this? http://stackoverflow.com/questions/20479792/error-selecting-from-large-data-table-causing-time-out-in-ef-code-first/20479978?noredirect=1#comment30607147_20479978 – Valentin Petkov Dec 09 '13 at 20:45
1

EF has changed a lot over the last 5 years. While it IS slower than old school, with the right settings, and writing your queries correctly, its not noticeably so. The biggest performance issue with EF is with bulk insert/update operations. Otherwise you are looking at a difference of less than 1s. Obviously incredibly complex queries are still best handled via SP/View, which can STILL be executed from EF. – Keith Dec 10 '13 at 19:39
Help out the guy with EF preformance the link before you comment – Valentin Petkov Dec 10 '13 at 21:43

score -1 · Answer 10 · edited Dec 07 '13 at 17:32

-1

Instead of DataReader use DataAdapter and Dataset. Execute all queries one time as show below:

  string SqlQuery ="Select * from Users;Select * From DataFields;Select * From DataFieldsValues;";

This will open sqlconnection one time only fire all these three queries and return three different data table in dataset then use your method for rendering.

edited Dec 07 '13 at 17:32

enb081

3,831
11
43
66

answered Dec 07 '13 at 17:02

Kamlesh

511
1
7
18

2

Down-vote, because although it will run all 3 at one call, it does NOT take advantage of the query engine preparing the join of data up front between user and components, thus making the user do extra work once the data IS retrieved. – DRapp Dec 09 '13 at 10:37
1

I agree completely with DRapp. In addition, you have now combined the logic into one query, making it harder to understand and maintain. Executing these 3 queries on the database does not cause the performance issue. You can create 3 query strings (one for Users, one for DataFields, and one for DataFieldsValues) and execute those queries separately within the same SqlConnection. – Chris Leyva Dec 13 '13 at 14:55

Fastest way to retrieve data from database

10 Answers10

Suggestion 1

Suggestion 2