While off topic for SO, I did want to give some guidelines you can follow:
I do this fairly regularly, but it is mainly handwork. First you have to decide what the "main entity" of your database is: Does it concern people, accounts, credit cards, or something else? I work mainly in the financial sector, so for me it is usually something like accounts/mortgages etc. But it can be anything, really.
You have to decide what everything in your database is related to, the "base entity" so to speak.
Once you have decided on your main entity, you can choose the 10% of your database. For example, if your main entity is Accounts, you can select 10% of AccountId's from an Account
table. This 10% you can put in an table.
Next comes the hard work: you'll have to write queries for each and everyone of your tables correlating the respective entities to your main entity. So, if your main entity is a person, you want all that persons addresses, all their accounts, all their phone numbers, all their history related to them. These queries can get quite complicated, and you really need to understand your database well. You'll get queries like:
SELECT Src.*
INTO [dbo].[GTP_MSI_MORTGAGERELATION_MORTGAGE_RELATION]
FROM [ATV].[GTP_MSI_MORTGAGERELATION_MORTGAGE_RELATION] AS Src
INNER JOIN atv.GTP_MSI_MORTGAGEREQUEST_APPLICANT APP
ON APP.MORTGAGE_RELATION_ID = Src.MORTGAGE_RELATION_ID
INNER JOIN ATV.GTP_MSI_MORTGAGELOAN_LOAN MLL
ON MLL.MORTGAGE_REQUEST_ID = APP.REQUEST_ID
INNER JOIN dbo.DOOR D
ON CONVERT(VARCHAR(255), D.NUMHYP) = MLL.LOAN_NUMBER
In this example the dbo.DOOR
table contains a selection of mortgageId's in scope (the example finds all relations between all persons/organizations associated to a mortgage).
What I most often do, is have production data (of some extraction in time) in some schema, and use queries like above to fill the dbo schema. So [ATV].[GTP_MSI_MORTGAGERELATION_MORTGAGE_RELATION]
in the example above would contain production data, while its dbo
namesake contains a smaller set of that production data (the data related to the mortgages in scope). Once I have the dbo schema filled, I can use anonimization software (I tend to use Red Gate Datagenerator) to remove private/business sensitive information. The anonimized data I can then extract, and use as source for my development database.
See also this (At the moment unfortunately only in Dutch, but if you push it trough a translator it should still make a lot of sense.)