How do I write a bash shell script to go through a series of files and pull a column of data out?

Question

I have a folder of about 10 thousand files and I need to write a bash shell script that will pull a COLUMN of data out and put it in a file. Help??? Please and thank you!

EDIT To Include:

#!/bin/bash

cd /Users/Larry/Desktop/TestFolder

find . -maxdepth 1 -mindepth 1 -type d
sed '4q;d'

A separate attempt

for dir in /Users/Larry/Desktop/TestFolder
do
  dir=${dir%*/}
  sed -n '4q;d' > Success.txt
done

The files are comma separated value files that open in a spreadsheet program like Numbers or Excel in a spreadsheet. I want to extract a single column from each file but there are at least 10 thousand files in each folder so arguments give to error "too long".

Another attempt

find /Users/Larry/Desktop/modified -type f -maxdepth 1 -name '.csv' -print0 | xargs -0 awk -F '","' {print $2}' find /Users/Larry/Desktop/modified -type f -maxdepth 1 -name '.csv' -print0 | xargs -0 awk -F '"*,*' '{print $2}' > DidItWorkThisTime.csv

The link to a previous question does not work for large sets of files.

Use `cut` or `awk`, depending on how the columns are delimited. — Barmar, Jun 13 '14 at 03:01
Lol if I had come up with anything useful I would have posted it. My codes (multiple attempts, and those from a friend) flat out haven't worked. — user3736201, Jun 13 '14 at 03:05
If you had code that worked, you wouldn't need to ask a question. Post what you tried, and we'll help you get it working. Either that, or hire a programmer who knows what he's doing. — Barmar, Jun 13 '14 at 03:06
You're not using the files as input to `sed` in either script. — Barmar, Jun 13 '14 at 03:08
I don't understand, sed isn't supposed to be in the script at all? — user3736201, Jun 13 '14 at 03:08
Where are you selecting a column out of the file? `4q;d` prints the 4th line, not a column. — Barmar, Jun 13 '14 at 03:10
What I mean is it should be `sed 'commands' filename` or `somecommand | sed 'commands'` -- it reads the data from a named file or piped input. — Barmar, Jun 13 '14 at 03:11
How are the columns delimited? Space, TAB, multiple spaces? Can you show some sample input? — Barmar, Jun 13 '14 at 03:12
"CELLS","SUM","1 1","ALLCELLS","0.0","number of cells at beginning of month","cfb60ca21c30bb2a7b728a478a02849b.csv" is what it looks like in a text editor; in Numbers (I'm on Mac) it's a regular spreadsheet of values. — user3736201, Jun 13 '14 at 03:14
That's more complicated, you really should have said that in the question. — Barmar, Jun 13 '14 at 03:16
Give a wildcard filename argument to `awk`, it will process all the files. — Barmar, Jun 13 '14 at 03:19
A wildcard filename argument? How do you do that? I was going to put the directory in place of "Textfile.csv" but I don't think that will work? — user3736201, Jun 13 '14 at 03:23
Do you know the basics of using Unix/Linux? Wildcards are pretty beginner stuff, not even related to programming, just using the shell interactively. — Barmar, Jun 13 '14 at 03:25
Larry$ awk -F "\"*,\"*" '{print $2}' /Users/Larry/Desktop/modified/*.csv > DidItWork.csv -bash: /usr/bin/awk: Argument list too long — user3736201, Jun 13 '14 at 03:28
@user3736201: please edit the extra information into the question, especially where it information like the data format. It is hard to determine the actual layout in a comment. Generally, add information to the question, not in comments, even though you're adding the information to address a question in the comments. — Jonathan Leffler, Jun 13 '14 at 03:34

Barmar · Answer 1 · 2014-06-13T03:50:57.610

0

If the directory has so many files that you exceed the argument limit, you should use find and xargs.

find /Users/Larry/Desktop/modified -type f -maxdepth 1 -name '*.csv' -print0 | 
    xargs -0 awk -F '"*,"*' '{print $2}' > Success.txt

edited Jun 13 '14 at 03:50

answered Jun 13 '14 at 03:32

Barmar

741,623
53
500
612

It's definitely a step in the right direction but when I enter the command in the shell and hit enter nothing happens. I also tried to put it in a shell script to run, but it didn't do anything there either? The > is to have the found columns go somewhere I can view the values. find /Users/Larry/Desktop/modified -type f -maxdepth 1 -name '.csv' -print0 | xargs -0 awk -F '","' {print $2}' find /Users/Larry/Desktop/modified -type f -maxdepth 1 -name '.csv' -print0 | xargs -0 awk -F '","' '{print $2}' > DidItWorkThisTime.csv – user3736201 Jun 13 '14 at 03:49
`'.csv'` should be `'*.csv'`. – Barmar Jun 13 '14 at 03:51
If that's what you actually typed, you need to put code inside backticks in comments. Otherwise, `*` is used to make italic and bold words. – Barmar Jun 13 '14 at 03:53
It would be best to add an update to your question, showing what you tried. Then you can format it nicely using the `{}` tool in the SO editor. – Barmar Jun 13 '14 at 03:53
Did so above, see edit – user3736201 Jun 13 '14 at 04:05
Did you really put multiple commands on the same line like that? That won't work. – Barmar Jun 13 '14 at 04:06
You're missing all the `*` characters in your command. – Barmar Jun 13 '14 at 04:07
I don't really want to spend hours trying to teach you how to execute basic shell commands. You need to learn how to use the shell or hire someone who knows. – Barmar Jun 13 '14 at 04:09
I feel like if it was basic the programmer in charge could have done it (he could not) and it would be a simple command (clearly it isn't) – user3736201 Jun 13 '14 at 04:13

score 0 · Answer 2 · answered Jun 13 '14 at 05:25

Try:

find /Users/Larry/Desktop/TestFolder -type f -maxdepth 1 -name '*.csv' -exec awk -F, '{ print $2 }' '{}' \; > Success.txt

It should execute awk on each csv file found, using a comma to separate fields (-F,), to print the second ($2) field, and redirect the output to Success.txt.

Also, you might swap > Success.txt for | tee Success.txt if you want to see the output AND have it saved to the file, at least while you're testing the command and don't want to wait for all those files to be processed to see if it worked.

omfg it did something. Stranger, I love you. It says I can't vote up but I would if I would. Infinite vote ups! — user3736201, Jun 13 '14 at 05:32

score 0 · Answer 3 · answered Jun 13 '14 at 05:29

A simple and straightforward adaptation of the code you already have.

find /Users/Larry/Desktop/TestFolder -maxdepth 1 -mindepth 1 -type f -name '*.csv' |
xargs cut -f2

If you want files, -type d is wrong. I changed that to -type f and added the -name option to select only *.csv files.

for dir in /Users/Larry/Desktop/TestFolder/*
do
  cut -f2 "$dir"/*.csv
done

This is assuming TestFolder contains a number of directories, and each of them contains one or more *.csv files. This can be further simplified to

cut -f2 /Users/Larry/Desktop/TestFolder/*/*.csv

but this could get you the Argument lenght exceeded error you tried to avoid.

All of these will print to standard out; add >Success.txt at the end to redirect to a file.

score 0 · Answer 4 · answered Jun 13 '14 at 05:44

cut -d',' -f1,2,3 *.csv > result.csv Assuming the field delimiter in your files is , [a csv file after all] and that you need in the result columns 1,2 and 3.

Above command will have problems if needed columns are having the delimiter in the column itself: "...,...",

How do I write a bash shell script to go through a series of files and pull a column of data out?

4 Answers4