0

I am a little inexperienced due to being self-taught so please bear with me.

I am using SQL Server 2016. I have made the following query to import some CSVs into SQL. The table you see is the table they are supposed to go into.

USE [open secrets]

CREATE TABLE [dbo].[Lobbyists] (
    [uniqID] [varchar] (36) NOT NULL,
    [lobbyist] [varchar] (50) NULL,
    [lobbyist_raw] [varchar] (50) NULL,
    [lobbyist_id] [char] (12) NULL,
    [year] [char] (4) NULL,
    [Official Position] [varchar] (254) NULL,
    [cid] [char] (9) NULL,
    [formercongmem] [char] (1) NULL
) ON [PRIMARY]

BULK
INSERT Lobbyists
FROM 'C:\aaa open secrets\Lobby\LOB_LOBBYIST\??????????????????'
WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO

SELECT*FROM Lobbyists

Here are the files I want to import:

screenshot

As you can see they all live in the directory and have a similar naming scheme. I need to import everything from lob_lobbyist000 all the way to lob_lobbyist123. There are no missing numbers.

My goal is to import them all at once instead of having to do so one at a time using only SQL Server Management Studio. Please show me the correct query to accomplish this. An explanation of what is going on in the query would also be much appreciated.

UPDATE current query:

USE [open secrets]

CREATE TABLE [dbo].[Lobbyists] (
    [uniqID] [varchar] (36) NOT NULL,
    [lobbyist] [varchar] (50) NULL,
    [lobbyist_raw] [varchar] (50) NULL,
    [lobbyist_id] [char] (12) NULL,
    [year] [char] (4) NULL,
    [Official Position] [varchar] (254) NULL,
    [cid] [char] (9) NULL,
    [formercongmem] [char] (1) NULL
) ON [PRIMARY]

--BULK
--INSERT Lobbyists
--FROM 'C:\aaa open secrets\Lobby\LOB_LOBBYIST\??????????????????'
--WITH
--(
--FIELDTERMINATOR = ',',
--ROWTERMINATOR = '\n'
--)
--GO

--SELECT*FROM Lobbyists


--SELECT * INTO Lobbyists_20170309 FROM Lobbyists


DECLARE @ALLFILENAMES TABLE (WHICHPATH VARCHAR(255),WHICHFILE varchar(255))
declare @filename varchar(255),
        @path     varchar(255),
        @sql      varchar(8000),
        @cmd      varchar(1000)


--get the list of files to process:
SET @path = '"C:\aaaopensecrets\LOB_LOBBYIST\'
SET @cmd = 'dir ' + @path + '*.txt" /b'
INSERT INTO  @ALLFILENAMES(WHICHFILE)
EXEC Master..xp_cmdShell @cmd
UPDATE @ALLFILENAMES SET WHICHPATH = @path where WHICHPATH is null

SELECT * FROM @ALLFILENAMES 

--cursor loop
declare c1 cursor 
for SELECT WHICHPATH,WHICHFILE 
FROM @ALLFILENAMES
open c1
fetch next from c1 into @path,@filename
While @@fetch_status <> -1
  begin
  --bulk insert won't take a variable name, so make a sql and execute it instead:
   set @sql = 'BULK INSERT Lobbyists FROM ''' + @path + @filename + ''' '
       + '     WITH ( 
               FIELDTERMINATOR = '','', 
               ROWTERMINATOR = ''\n''
            ) '
print @sql
exec (@sql)

  fetch next from c1 into @path,@filename
  end
close c1
deallocate c1

The result:

(125 row(s) affected)

(125 row(s) affected)

(125 row(s) affected)
BULK INSERT Lobbyists FROM '"C:\aaaopensecrets\LOB_LOBBYIST\lob_lobbyist000.txt'      WITH ( 
               FIELDTERMINATOR = ',', 
               ROWTERMINATOR = '\n'
            ) 
Msg 4861, Level 16, State 1, Line 1
Cannot bulk load because the file ""C:\aaaopensecrets\LOB_LOBBYIST\lob_lobbyist000.txt" could not be opened. Operating system error code 123(The filename, directory name, or volume label syntax is incorrect.).

the error loops on for every single file

ana ng
  • 107
  • 2
  • 8
  • This is the first link I found when I googled "BULK INSERT all files in folder" http://stackoverflow.com/questions/16076309/import-multiple-csv-files-to-sql-server-from-a-folder. Please try it and post back any issues. – Nick.Mc Mar 09 '17 at 02:26
  • @Nick.McDermaid The explanation was confusing to me. I don't want to import the file names and I don't know how to separate the parts for importing filenames from importing the actual content of the files. – ana ng Mar 09 '17 at 02:34
  • You need to get the list of files so you know what to import. You may as well store these for auditing and troubleshooting purposes. Also to keep track of which files you have imported and which ones you haven't. If you still don't want to that's fine - just store them in a table variable instead of a real table. If you are uncomfortable with a T-SQL solution you might want to consider a SSIS solution instead. A google search on "SSIS import all files in folder" gives you many examples – Nick.Mc Mar 09 '17 at 02:53
  • @Nick.McDermaid The only SQL programs I have are Management Studio and SQL Profiler. I only have experience using Management Studio. – ana ng Mar 09 '17 at 02:57
  • If you are uncomfortable with T-SQL then you might want to look into SSIS (which is developed in the free downloadable tool called SQL Server Data Tools). Otherwise if you want to stick with T-SQL(which is what SQL Server Management Studio uses), you're going to need to learn about things like cursors and `xp_cmdshell` and `exec` from that other post. I suggest you try the approach in my link and post any specific issues or errors. For starters if you don't want to store filenames, I can suggest a change to that code. I'm not going to custom write you a script but someone else probably will. – Nick.Mc Mar 09 '17 at 03:08
  • @Nick.McDermaid Okay I will give your suggestion a swing. – ana ng Mar 09 '17 at 03:10
  • @Nick.McDermaid this is what I wrote: `SET 'C:\aaa open secrets\Lobby\LOB_LOBBYIST' INSERT INTO Lobbyists DECLARE c1 cursor for SELECT ALL 'FROM C:\aaa open secrets\Lobby\LOB_LOBBYIST' WHERE .txt OPEN c1 FETCH NEXT FROM c1 into Lobbyists WHILE @@FETCH_STATUS <>1 BEGIN SET BULK INSERT Lobbyists FROM 'C:\aaa open secrets\Lobby\LOB_LOBBYIST' WITH ( FIELDTERMINATOR = ',', ROWTERMINATOR = '\n' ) GO` – ana ng Mar 09 '17 at 03:17
  • That's excellent that you gave it a try. Please also explain outcomes (I guess you got an error - please post it). In the meantime, what you need to do is copy the answer from the other question exactly as it is, and try and identify which pieces need to be changed for your situation. For example, `SET @path = 'C:\Dump\'` would need to change to `SET @path = 'C:\aaa open secrets\Lobby\LOB_LOBBYIST\'`. I also need to confirm that you are doing this in a DEV environment, not production? Also are you intending to run this repeatedly or just once? – Nick.Mc Mar 09 '17 at 03:24
  • @Nick.McDermaid I have posted the error and have plugged in everything I thought was relevant from the answer and changed the according to my situation. I am not sure what you mean by dev vs production environment. I am a journalist working on a personal database for an article if that answers that question. I am pretty sure I only need to run this query once. Once all the files are imported to the table I don't think I'll need to run it again. – ana ng Mar 09 '17 at 03:30
  • Sorry about all the lingo. I will take that other code and change for you and post it and we'll work on it together. I'll be particularly enthusiastic if you are doing investigation on lobbyists! – Nick.Mc Mar 09 '17 at 03:41
  • @Nick.McDermaid Thank you very much! I feel like this is an opportunity to learn how to do this particular type of query. I am doing an investigation into campaign finance. Lobbyists are a huge part of that investigation. – ana ng Mar 09 '17 at 03:45

2 Answers2

1

Adapted from here:

Import Multiple CSV Files to SQL Server from a Folder

Before you run the code below, back up your table by running this command:

SELECT * INTO Lobbyists_20170309 FROM Lobbyists

Once you've done that, you have a copy of your table.

Now try running this:

DECLARE @ALLFILENAMES TABLE (WHICHPATH VARCHAR(255),WHICHFILE varchar(255))
declare @filename varchar(255),
        @path     varchar(255),
        @sql      varchar(8000),
        @cmd      varchar(1000)


--get the list of files to process:
SET @path = '"C:\aaa open secrets\Lobby\LOB_LOBBYIST\'
SET @cmd = 'dir ' + @path + '*.txt" /b'
INSERT INTO  @ALLFILENAMES(WHICHFILE)
EXEC Master..xp_cmdShell @cmd
UPDATE @ALLFILENAMES SET WHICHPATH = @path where WHICHPATH is null

SELECT * FROM @ALLFILENAMES 

--cursor loop
declare c1 cursor 
for SELECT WHICHPATH,WHICHFILE 
FROM @ALLFILENAMES
open c1
fetch next from c1 into @path,@filename
While @@fetch_status <> -1
  begin
  --bulk insert won't take a variable name, so make a sql and execute it instead:
   set @sql = 'BULK INSERT Lobbyists FROM ''' + @path + @filename + ''' '
       + '     WITH ( 
               FIELDTERMINATOR = '','', 
               ROWTERMINATOR = ''\n''
            ) '
print @sql
exec (@sql)

  fetch next from c1 into @path,@filename
  end
close c1
deallocate c1
Community
  • 1
  • 1
Nick.Mc
  • 18,304
  • 6
  • 61
  • 91
  • Looking through the query it looks like the first line you posted backs up a single table. Am I correct in thinking that this is a lot like doing a full back up of the server except it does so for a single table? Does this mean that I can pick and choose what parts of my server to back up whenever I need to? It looks more efficient than backing up the whole server. – ana ng Mar 09 '17 at 03:52
  • Msg 15281, Level 16, State 1, Procedure xp_cmdshell, Line 1 [Batch Start Line 0] SQL Server blocked access to procedure 'sys.xp_cmdshell' of component 'xp_cmdshell' because this component is turned off as part of the security configuration for this server. A system administrator can enable the use of 'xp_cmdshell' by using sp_configure. For more information about enabling 'xp_cmdshell', search for 'xp_cmdshell' in SQL Server Books Online. – ana ng Mar 09 '17 at 03:55
  • Here's a guide to enabling xp_cmdshell. Just run it all once in a different window. http://stackoverflow.com/questions/5131491/enable-xp-cmdshell-sql-server – Nick.Mc Mar 09 '17 at 03:57
  • When you backup your database, you make a copy of it in another file. You can now take that file somewhere else safe, i.e. take it offsite so say your computer fails, you still have the file externally somewhere else and it's safe and you can restore it again. The T-SQL code that I posted is a fast easy way to make a backup of a table by copying all the data into another table _inside_ your database. If your laptop (or even your database) fails, you've lost your entire database. – Nick.Mc Mar 09 '17 at 04:00
  • It's important that you back up both your database and your computer and if possible take the backups offsite. That might include copying the backup to a cloud provider (but of course this is quite insecure nowadays). Is your analysis around textual elements of data or more around the dollars involved? There is another free tool you may want to use called Power BI which is great for visualising numeric analysis. It also lets you report across multiple files – Nick.Mc Mar 09 '17 at 04:02
  • I have run the query to enable xp_cmdshell. Upon re-running the query you wrote it tells me two rows were affected which doesn't seem quite right. If I try to select*from Lobbyists it will say 0 rows affected. It will show me a table with all the column names but no values. – ana ng Mar 09 '17 at 04:04
  • So the table backup is not an alternative to a fullbackup. I regularly run full back ups. I think I will copy paste the backup into several external drives, along with my saved queries and all the text files my data came from. My analysis will be based on both text and dollar movement. I will definitely check out Power BI. – ana ng Mar 09 '17 at 04:09
  • Oh sorry I think your sources might be .txt files, not .csv files. I've updated the code to recognise .txt instead of .csv. Please copy the code out and try again. – Nick.Mc Mar 09 '17 at 04:11
  • I have copied the updated version. Not much has changed as far as the results of executing the query goes. When I look at the updated version of the query I still see some references to csv. Should those be changed to txt as well? – ana ng Mar 09 '17 at 04:18
  • Sorry I thought I edited it but it mustn't have stuck. You need to change `*.csv` to `*.txt` in one place – Nick.Mc Mar 09 '17 at 04:40
  • I have made the change from `*.csv` to `*.txt`. Now it tells me two rows have been affected twice and then that 0 rows have been affected. – ana ng Mar 09 '17 at 04:44
  • I made a few more changes. Please try again. Also you need to establish what the extension is on those files - is it .txt? Please see here: http://kb.winzip.com/kb/entry/26/ – Nick.Mc Mar 09 '17 at 04:48
  • They are all .txt. I thought .csv and .txt were the same. I must have been mistaken. I am trying you changes now. – ana ng Mar 09 '17 at 04:50
  • They are the same but the code is written to only import one of them (this can be changed). It helps to troubleshoot if I know for sure what the extension is. – Nick.Mc Mar 09 '17 at 04:51
  • It says `Cannot bulk load. The file "C:\aaa open secrets\Lobby\LOB_LOBBYIST\The system cannot find the path specified." does not exist.` Perhaps it should be directed to one of the files inside the folder rather than just the folder that contains all the files. – ana ng Mar 09 '17 at 05:12
  • I've made another edit. The step that picks up the file list didn't work. The change I made should make it work – Nick.Mc Mar 09 '17 at 05:15
  • I'm pretty sure this is because you have spaces in the path. Can you go into file explorer and rename folder `aaa open secrets` to something without spaces, then go and change the SQL code to reflect the new path and try again – Nick.Mc Mar 09 '17 at 05:31
  • No but that is okay. I ended up importing them one by one. – ana ng Mar 10 '17 at 17:29
  • You were so close its a shame you gave up. I hope you learnt something anyway – Nick.Mc Mar 11 '17 at 06:46
-1
Create procedure usp_ImportMultipleFiles @filepath varchar(500), 
    @pattern varchar(100), @TableName varchar(128)
as
set quoted_identifier off
declare @query varchar(1000)
declare @max1 int
declare @count1 int
Declare @filename varchar(100)
set @count1 =0
create table #x (name varchar(200))
set @query ='master.dbo.xp_cmdshell "dir '+@filepath+@pattern +' /b"'
insert #x exec (@query)
delete from #x where name is NULL
select identity(int,1,1) as ID, name into #y from #x 
drop table #x
set @max1 = (select max(ID) from #y)
--print @max1
--print @count1
While @count1 <= @max1
begin
set @count1=@count1+1
set @filename = (select name from #y where [id] = @count1)
set @Query ='BULK INSERT '+ @Tablename + ' FROM "'+ @Filepath+@Filename+'" 
    WITH ( FIELDTERMINATOR = ",",ROWTERMINATOR = "\n")'
--print @query
exec (@query)
insert into logtable (query) select @query
end

Exec usp_ImportMultipleFiles 'c:\myimport\', '*.csv', 'Account'

  • 1
    If you copy paste, please reference the source http://www.databasejournal.com/features/mssql/article.php/3325701/Import-multiple-Files-to-SQL-Server-using-T-SQL.htm – Nick.Mc Mar 09 '17 at 05:26