220

If you were going to store a user agent in a database, how large would you accomdate for?

I found this technet article which recommends keeping UA under 200. It doesn't look like this is defined in the HTTP specification at least not that I found. My UA is already 149 characters, and it seems like each version of .NET will be adding to it.

I know I can parse the string out and break it down but I'd rather not.


EDIT
Based on this Blog IE9 will be changing to send the short UA string. This is a good change.


jeteon
  • 3,471
  • 27
  • 40
JoshBerke
  • 66,142
  • 25
  • 126
  • 164
  • I posted this question: http://stackoverflow.com/questions/17731699/is-there-a-site-which-shows-peoples-user-agent-without-cutting-off-long-ones – Erx_VB.NExT.Coder Jul 18 '13 at 18:56
  • What is your UA string? I found only some strings with 137 characters in my database (which is not too big). – Martin Thoma Jul 17 '14 at 18:36
  • When I asked this question five years ago or so. UA strings were getting long they included lots of extra stuff... – JoshBerke Jul 19 '14 at 19:23
  • 3
    I must ask, are the answers on this thread still relevant? Most of these answers are 8 years old. – Peschke Aug 12 '17 at 19:58

11 Answers11

138

HTTP specification does not limit length of headers at all. However web-servers do limit header size they accept, throwing 413 Entity Too Large if it exceeds.

Depending on web-server and their settings these limits vary from 4KB to 64KB (total for all headers).

vartec
  • 131,205
  • 36
  • 218
  • 244
  • 16
    Apache limits the maximum field length to 8k (http://httpd.apache.org/docs/2.2/mod/core.html#limitrequestfieldsize). – Gumbo Mar 17 '09 at 17:11
  • I'm less concerned with server limits since I am on IIS, I know it won't ever be bigger then their limit which is still preety large if memory serves.... – JoshBerke Mar 17 '09 at 18:42
  • 6
    @Josh -- memory serves you well, on IIS it's 16K by default. ;-) – vartec Mar 17 '09 at 21:11
  • For Tomcat server the max header length allowed is 8KB. Check here https://tomcat.apache.org/tomcat-8.0-doc/config/http.html – javaGroup456 Sep 16 '20 at 12:39
84

My take on this:

  • Use a dedicated table to store only UserAgents (normalize it)
  • In your related tables, store an Foreign Key value to point back to the UserAgent auto-increment primary key field
  • Store the actual UserAgent string in a TEXT field and care not about the length
  • Have another UNIQUE BINARY(32) (or 64, or 128 depending on your hash length) and hash the UserAgent

Some UA strings can get obscenely long. This should spare you the worries. Also enforce a maximum length in your INSERTer to keep UA strings it under 4KB. Unless someone is emailing you in the user-agent, it should not go over that length.

CodeAngry
  • 12,760
  • 3
  • 50
  • 57
  • 1
    TEXT field should not be used anymore as stated in MSDN : http://msdn.microsoft.com/en-us/library/ms187993(v=sql.90).aspx Instead, use NVARCHAR(MAX). Source: http://stackoverflow.com/questions/564755/sql-server-text-type-vs-varchar-data-type – Matt Roy Aug 14 '13 at 14:54
  • 3
    My database has 10,235 distinct user agent strings. I wanted to find the fastest hash algorithm that didn't produce any collisions. For my PHP environment I found md5 performed quickly at 2.3 seconds with no collisions. Interestingly I tried crc32 and crc32b and they also performed at 2.3 seconds with no collisions. But, because md5 has more combinations than crc32 and crc32b, md5 would likely have fewer possible collisions. Anyway, md5 is my choice and I expect it will work fine. – noctufaber Aug 13 '14 at 18:24
  • 4
    Why hash the User Agent? Is this for quick lookup or something? – Boom Dec 23 '15 at 16:12
  • 2
    @Boom Lookups and uniqueness as DB unique keys can only be so long. – CodeAngry Dec 24 '15 at 17:08
  • 3
    @noctufaber crc32 is not a hash, it does not attempt to be collision resistant. –  Sep 29 '16 at 09:55
  • 1
    @Boom For uniqueness since a varchar(max) can't be unique'ed. And I hold all UAs in a single table and reference IDs wherever I need them. Hence the hash table. – CodeAngry Nov 06 '21 at 18:58
  • I do not understand why normalization? Is there any reason behind it? I guess raw data can be useful and more reliable - AFAIK it is very hard to keep an up to date normalizer for almost pretty anything, regardless what is it. But things like email worth it but IMO normalizing user agent is a little overwhelming. Any Idea @CodeAngry – Kasir Barati May 29 '22 at 09:22
  • 1
    @KasirBarati I always normalize strings in a DB. Especially those that can run long. And since I only use them in reports... I don't care they are in another table when I need to query against them. But Imagine a 1KB UserAgent repeating 10K times in your Table where, let's say, you log traffic. I'd rather have a bigint repeat 10K times than 1KB. – CodeAngry Sep 21 '22 at 22:06
37

Noticed something like this in our apache logs. It looks abnormal to me but I regularly see such things in logs mostly from Windows systems.

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; (R1 1.6); SLCC1; .NET CLR 2.0.50727; InfoPath.2; OfficeLiveConnector.1.3; OfficeLivePatch.0.0; .NET CLR 3.5.30729; .NET CLR 3.0.30618; 66760635803; runtime 11.00294; 876906799603; 97880703; 669602703; 9778063903; 877905603; 89670803; 96690803; 8878091903; 7879040603; 999608065603; 799808803; 6666059903; 669602102803; 888809342903; 696901603; 788907703; 887806555703; 97690214703; 66760903; 968909903; 796802422703; 8868026703; 889803611803; 898706903; 977806408603; 976900799903; 9897086903; 88780803; 798802301603; 9966008603; 66760703; 97890452603; 9789064803; 96990759803; 99960107703; 8868087903; 889801155603; 78890703; 8898070603; 89970603; 89970539603; 89970488703; 8789007603; 87890903; 877904603; 9887077703; 798804903; 97890264603; 967901703; 87890703; 97690420803; 79980706603; 9867086703; 996602846703; 87690803; 6989010903; 977809603; 666601903; 876905337803; 89670603; 89970200903; 786903603; 696901911703; 788905703; 896709803; 96890703; 998601903; 88980703; 666604769703; 978806603; 7988020803; 996608803; 788903297903; 98770043603; 899708803; 66960371603; 9669088903; 69990703; 99660519903; 97780603; 888801803; 9867071703; 79780803; 9779087603; 899708603; 66960456803; 898706824603; 78890299903; 99660703; 9768079803; 977901591603; 89670605603; 787903608603; 998607934903; 799808573903; 878909603; 979808146703; 9996088603; 797803154903; 69790603; 99660565603; 7869028603; 896707703; 97980965603; 976907191703; 88680703; 888809803; 69690903; 889805523703; 899707703; 997605035603; 89970029803; 9699094903; 877906803; 899707002703; 786905857603; 69890803; 97980051903; 997603978803; 9897097903; 66960141703; 7968077603; 977804603; 88980603; 989700803; 999607887803; 78690772803; 96990560903; 98970961603; 9996032903; 9699098703; 69890655603; 978903803; 698905066803; 977806903; 9789061703; 967903747703; 976900550903; 88980934703; 8878075803; 8977028703; 97980903; 9769006603; 786900803; 98770682703; 78790903; 878906967903; 87690399603; 99860976703; 796805703; 87990603; 968906803; 967904724603; 999606603; 988705903; 989702842603; 96790603; 99760703; 88980166703; 9799038903; 98670903; 697905248603; 7968043603; 66860703; 66860127903; 9779048903; 89670123903; 78890397703; 97890603; 87890803; 8789030603; 69990603; 88880763703; 9769000603; 96990203903; 978900405903; 7869022803; 699905422903; 97890703; 87990903; 878908703; 7998093903; 898702507603; 97780637603; 966907903; 896702603; 9769004803; 7869007903; 99660158803; 7899099603; 8977055803; 99660603; 7889080903; 66660981603; 997604603; 6969089803; 899701903; 9769072703; 666603903; 99860803; 997608803; 69790903; 88680756703; 979805677903; 9986047703; 89970803; 66660603; 96690903; 8997051603; 789901209803; 8977098903; 968900326803; 87790703; 98770024803; 697901794603; 69990803; 887805925803; 968908903; 97880603; 897709148703; 877909476903; 66760197703; 977908603; 698902703; 988706504803; 977802026603; 88680964703; 8878068703; 987705107903; 978902878703; 8898069803; 9768031703; 79680803; 79980803; 669609328703; 89870238703; 99960593903; 969904218703; 78890603; 9788000703; 69690630903; 889800982903; 988709748803; 7968052803; 99960007803; 969900800803; 668604817603; 66960903; 78790734603; 8868007703; 79780034903; 8878085903; 976907603; 89670830803; 877900903; 969904889703; 7978033903; 8987043903; 99860703; 979805903; 667603803; 976805348603; 999604127603; 97790701603; 78990342903; 98770672903; 87990253903; 9877027703; 97790803; 877901895603; 8789076903; 896708595603; 997601903; 799806903; 97690603; 87790371703; 667605603; 99760303703; 97680283803; 788902750803; 787909803; 79780603; 79880866903; 9986050903; 87890543903; 979800803; 97690179703; 876901603; 699909903; 96990192603; 878904903; 877904734903; 796801446903; 977904803; 9887044803; 797805565603; 98870789703; 7869093903; 87790727703; 797801232803; 666604803; 9778071903; 9799086703; 6969000903; 89670903; 8799075903; 897708903; 88680903; 97980362603; 97980503903; 889803256703; 88980388703; 789909376803; 69690703; 6969025903; 89970309903; 96690703; 877901847803; 968901903; 96690603; 88680607603; 7889001703; 789904761803; 976807703; 976902903; 878907889703; 9897014903; 896707046603; 696909903; 666603998903; 969902703; 79680421803; 9769075603; 798800192703; 97990903; 9689024903; 668604803; 969908671903; 9996094703; 69990642703; 97890895903; 977805619903; 79980859903; 88980443803; 98970649603; 997602703; 888802169903; 699907803; 667602028803; 786903283903; 997607703; 969909803; 798809925903; 9976045603; 97790903; 9789001903; 966903603; 9789069603; 968906603; 6989091803; 896701603; 6979059803; 978803903; 997606362603; 88980803; 98970803; 88880921703; 8997065703; 899700703; 698908703; 797801027903; 7889050903; 87890603; 78690703; 99660069703; 97980309903; 976800603; 666606803; 898707703; 79880019803; 66960250803; 7978049803; 88780602603; 79680903; 88880792703; 96990903; 667608603; 87790730903; 98970903; 9699032903; 8987004803; 88880703; 89770046603; 978800803; 969908903; 9798022603; 696901903; 799803703; 989703703; 668605903; 79780903; 998601371703; 796803339703; 87890922603; 898708903; 9966061903; 66960891903; 96790903; 8779050803; 98870858803; 976909298603; 9887029903; 669608703; 979806903; 878903803; 99960703; 9789086703; 979801803; 66960008703; 979806830803; 99760212703; 786906603; 797807603; 789907297703; 96990703; 786901603; 796807766603; 896702651603; 789902585603; 66660925903; 9986085703; 66960302703; 69890703; 789900703; 89970903; 9679060703; 9789002903; 979908821603; 986708140803; 976809828703; 7988082803; 79680997903; 99960803; 9788081903; 979805703; 787908603; 66960602803; 9887098703; 978803237703; 888806804603; 999604703; 977904703; 966904635703; 97680291703; 977809345603; 8878046703; 988709803; 976900773603; 989703903; 88780198603; 87790603; 986708703; 78890604703; 87790544803; 976809850903; 887806703; 987707527603; 79880803; 9897059603; 897709820603; 97880804803; 66960026703; 9789062803; 9867090803; 669600603; 8967087703; 78890903; 89770903; 97980703; 976802687603; 66860400803; 979901288603; 96990160903; 99860228903; 966900703; 66760603; 9689035703; 9779064703; 7968023603; 87890791903; 98770870603; 9798005803; 6969087903; 9779097903; 6979065703; 699903252603; 79780989703; 87690901803; 978905763903; 977809703; 97790369703; 899703269603; 8878012703; 78790803; 87690395603; 8888042803; 667607689903; 8977041803; 6666085603; 6999080703; 69990797803; 88680721603; 99660519803; 889807603; 87890146703; 699906325903; 89770603; 669608615903; 9779028803; 88880603; 97790703; 79780703; 97680355603; 6696024803; 78790784703; 97880329903; 9699077703; 89870803; 79680227903; 976905852703; 8997098903; 896704796703; 66860598803; 9897036703; 66960703; 9699094703; 9699008703; 97780485903; 999603179903; 89770834803; 96790445603; 79680460903; 9867009603; 89870328703; 799801035803; 989702903; 66960758903; 66860150803; 6686088603; 9877092803; 96990603; 99860603; 987703663603; 98870903; 699903325603; 87790803; 97680703; 8868030703; 9799030803; 89870703; 97680803; 9669054803; 6979097603; 987708046603; 999608603; 878904803; 998607408903; 968903903; 696900703; 977907491703; 6686033803; 669601803; 99960290603; 887809169903; 979803703; 69890903; 699901447903; 8987064903; 799800603; 98770903; 8997068703; 967903603; 66760146803; 978805087903; 697908138603; 799801603; 88780964903; 989708339903; 8967048603; 88880981603; 789909703; 796806603; 977905977603; 989700603; 97780703; 9669062603; 88980714603; 897709545903; 988701916703; 667604694903; 786905664603; 877900803; 886805490903; 89970559903; 99960531803; 7998033903; 98770803; 78890418703; 669600872803; 996605216603; 78690962703; 667604903; 996600903; 999608903; 9699083803; 787901803; 97780707603; 787905312703; 977805803; 8977033703; 97890708703; 989705521903; 978800703; 698905703; 78890376903; 878907703; 999602903; 986705903; 668602719603; 979901803; 997606903; 66760393903; 987703603; 78790338903; 96890803; 97680596803; 666601603; 977902178803; 877902803; 78790038603; 8868075703; 99960060603)

bluish
  • 26,356
  • 27
  • 122
  • 180
harry
  • 1,061
  • 1
  • 13
  • 13
  • 43
    Is there anyone that would like to comment on what on earth is going on with this user agent? lol I must add, I am curious how such a beast can form. – Erx_VB.NExT.Coder Jul 30 '12 at 20:41
  • 28
    If anyone is curious; **this one clocks in at 8010 chars**. How could anyone on the browser team have thought that this was a good idea? It's as mad as a bag of cats! – Doctor Jones Jul 17 '13 at 15:04
  • 20
    Does truncating this user agent string at 256 or 512 get rid of any data that is useful at all? – JackAce May 12 '14 at 16:24
  • 21
    I've made some observations, but not yet worked it out. There are `642` numbers. The first four numbers are always `6`, `7`, `8`, or `9`. The fifth number is always `0`. The last three are always `603`, `703`, `803`, or `903`. Perhaps someone might recognise that pattern? (Half-life 3 confirmed?) – meshy Oct 10 '16 at 13:41
  • 3
    Interesting. I have now added code to truncate the UA string to 255 chars for my db logs. – Deepak Thomas Feb 22 '17 at 13:55
29

Since it's for database purposes and there is no practical limit i'd go for a UserAgents Table with UserAgentId as Int and UserAgentString as NVarChar(MAX) and use a foreign key on the original table.

Diadistis
  • 12,086
  • 1
  • 33
  • 55
  • 23
    You would probably end up with user agents on a 1-to-a-handful relationship with your users. Most user agents get so tweaked by the items a user has installed, and in a particular order, that they are almost personally identifiable (one other answer has a good example of this happening). In fact, the EFF did a [study (pdf)](https://panopticlick.eff.org/browser-uniqueness.pdf) about it. – patridge Mar 08 '11 at 20:29
  • 1
    @patridge +1 for link, very good study. It's a bit off topic because they look at several fingerprints and not only the user agent strings. In a real world scenario, for a site that gets several million page views per month you would end up with a few thousand user agent string, so normalizing makes sense IMHO. With that said, I'm not very positive on storing user agent strings in the database :P – Diadistis Mar 09 '11 at 06:07
  • @patridge The link to the study is now broken: [updated link](https://panopticlick.eff.org/static/browser-uniqueness.pdf) – Boris Dalstein Mar 06 '19 at 18:37
  • 1
    @patridge I agree that your idea sounds plausible, but my data disagrees with us both. I am working with exactly this kind of system right now, and I have around 70k unique UAs for 1.2m users. The reason I am on this page is that I chose 256 as a limit on my database field and have found that 50k out of the 70k were truncated so I have lost some information. I'm going to increase it to 4k now. Will be interesting to know how many would have been unique if they were not truncated – Darren H Feb 24 '20 at 09:45
14

How's this for big?:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; YPC 3.2.0; SearchSystem6829992239; SearchSystem9616306563; SearchSystem6017393645; SearchSystem5219240075; SearchSystem2768350104; SearchSystem6919669052; SearchSystem1986739074; SearchSystem1555480186; SearchSystem3376893470; SearchSystem9530642569; SearchSystem4877790286; SearchSystem8104932799; SearchSystem2313134663; SearchSystem1545325372; SearchSystem7742471461; SearchSystem9092363703; SearchSystem6992236221; SearchSystem3507700306; SearchSystem1129983453; SearchSystem1077927937; SearchSystem2297142691; SearchSystem7813572891; SearchSystem5668754497; SearchSystem6220295595; SearchSystem4157940963; SearchSystem7656671655; SearchSystem2865656762; SearchSystem6520604676; SearchSystem4960161466; .NET CLR 1.1.4322; .NET CLR 2.0.50727; Hotbar 10.2.232.0; SearchSystem9616306563; SearchSystem6017393645; SearchSystem5219240075; SearchSystem2768350104; SearchSystem6919669052; SearchSystem1986739074; SearchSystem1555480186; SearchSystem3376893470; SearchSystem9530642569; SearchSystem4877790286; SearchSystem8104932799; SearchSystem2313134663; SearchSystem1545325372; SearchSystem7742471461; SearchSystem9092363703; SearchSystem6992236221; SearchSystem3507700306; SearchSystem1129983453; SearchSystem1077927937; SearchSystem2297142691; SearchSystem7813572891; SearchSystem5668754497; SearchSystem6220295595; SearchSystem4157940963; SearchSystem7656671655; SearchSystem2865656762; SearchSystem6520604676; SearchSystem4960161466; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)

bluish
  • 26,356
  • 27
  • 122
  • 180
cd.
  • 149
  • 1
  • 2
6

There is no stated limit, only the limit of most HTTP servers. Keeping that in mind however, I would implement a column with a reasonable fixed length (use Google to find a list of known user agents, find the largest and add 50%), and just crop any user agent that is too long - any exceptionally long user agent is probably unique enough even when cropped, or is the result of some kind of bug or "hack" attempt.

Jeremy Stein
  • 19,171
  • 16
  • 68
  • 83
David
  • 24,700
  • 8
  • 63
  • 83
  • 1
    Looks like the standard length is 120-150, based on http://www.networkinghowtos.com/howto/common-user-agent-list/ . Therefore I would **crop the Max Length at 200.** – gene b. Sep 06 '19 at 18:44
5

Assume the user agent string has no limit on its length and prepare to store such a value. As you've seen, length is unpredictable.

In Postgres, there's a text type that accepts strings of unlimited length. Use that.

Most likely though, you'll have to start truncating at some point. Call it good at a reasonably useful increment (200, 1k, 4k) and throw away the rest.

Brad Koch
  • 19,267
  • 19
  • 110
  • 137
5

I got this user agent today, overflowing our vendor's storage field:

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; MDDR; OfficeLiveConnector.1.3; OfficeLivePatch.0.0; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)

Ridiculous! 229 chars?

So take that size, double it, double it again, and you should be set until Microsoft's next blunder (maybe this time next year).

Go bigger than 1000!

bluish
  • 26,356
  • 27
  • 122
  • 180
3

Not an indication of how big a user agent can get, as there's plenty of answers showing the edge cases they've came across, but the longest that could find on http://www.useragentstring.com/pages/useragentstring.php?name=All was 250 bytes.

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; Media Center PC 5.0; SLCC1; OfficeLiveConnector.1.5; OfficeLivePatch.1.3; .NET4.0C; Lunascape 6.3.

Travis
  • 13,311
  • 4
  • 26
  • 40
2

Here is one that is 257

Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; GTB6; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.30; InfoPath.2; .NET CLR 3.0.04506.648; OfficeLiveConnector.1.3; OfficeLivePatch.0.0; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)

bluish
  • 26,356
  • 27
  • 122
  • 180
  • I've seen up to 255 characters so far on a very very low traffic site. So not surprising. .Net 4.0 will probally add another 20 chars as well. – JoshBerke Aug 07 '09 at 15:41
2

I'll give you the standard answer:

Take the largest possible value you can possibly imagine it being, double it, and that's your answer.

Ed Marty
  • 39,590
  • 19
  • 103
  • 156