To read and print 1st 1000 rows from a csv using awk command and then next 1000 and so on

Question

I have a csv that has around 25k rows. I have to pick 1000 rows from column#1 and column#2 at a time and then next 1000 rows and so on.

I am using below command, and its working fine in picking up all the values from column#1 and Column#2 i.e 25K fields from both the columns, I want to pick value like 1-1000, put them in my sql export query then 1001-2000,2001-3000 and so on and put the value in WHERE IN in my export query and append the result in dbData.csv file.

My code is below:

awk -F ',' 'NR > 2 {print $1}'  $INPUT > column1.txt
i=$(cat column1.txt | sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}')

awk -F ',' 'NR > 2 {print $2}'  $INPUT > column2.txt
j=$(cat column2.txt | sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}')

echo "Please wait - connecting to database..."
db2 connect to  $sourceDBStr user  user123 using pas123

db2 "export to dbData.csv of del select partnumber,language_id as LanguageId from  CATENTRY c , CATENTDESC cd where c.CATENTRY_ID=cd.CATENTRY_ID and c.PARTNUMBER in  ($i) and cd.language_id in ($j)"

Welcome to SO, thanks for sharing your efforts. Please do mention what you are trying to do when you fetch first 1000 lines and so on? Are you aiming to run a command OR print something? Please do add more details in your questions so that we could try to help you, cheers. — RavinderSingh13, Dec 15 '21 at 13:02
I want to fetch some data from DB, and when putting 25k value in my export query its saying to many value, so export is not possible. So i want to export the data in the chunks of 1000 rows. i=$(cat column1.txt | sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}') db2 "export to dbData.csv of del select partnumber,language_id as LanguageId from CATENTRY c , CATENTDESC cd where c.CATENTRY_ID=cd.CATENTRY_ID and c.PARTNUMBER in ($i); — Preeti Rajgariya, Dec 15 '21 at 13:06
[edit] your question to include that and all other information, don't put it in comments where it can't be formatted and could easily be missed. Also, you never need sed when you're using awk so make sure to show us the final output format you want, not the format you're currently thinking you'll need to run sed on. Be sure to show a [mcve] that gets split, say, every 3 lines instead of 1000. — Ed Morton, Dec 15 '21 at 13:11
`awk '{print $1}' CatalogEntriesDescription.csv | split -l 1000`? — Renaud Pacalet, Dec 15 '21 at 13:12
Does this answer your question? [How can I split a large text file into smaller files with an equal number of lines?](https://stackoverflow.com/questions/2016894/how-can-i-split-a-large-text-file-into-smaller-files-with-an-equal-number-of-lin) — tripleee, Dec 15 '21 at 13:15
I have a big CSV from which i have to pick the value of 1st column and 2nd column, which i achieved. now i want to pick 1000 rows every time and pass those values to my sql query. i think split will not work here. — Preeti Rajgariya, Dec 15 '21 at 13:22
Why do you think `split` will _not work here_? Did you try? What went wrong? Why do you mention only the first column in your question and then also the second column in your comments? — Renaud Pacalet, Dec 15 '21 at 13:41
I am new to shell script, If you can fit this split in my awk command then it will be helpful. what i know about split is that it will create so many different files which i dnt want. I just want to pick 1000 rows from column1 and pass that to my sql.. and yes i am working on two columns. I showed 1 column for example. Please help me to pick the value of column 1. Rest i will see.. — Preeti Rajgariya, Dec 15 '21 at 13:52
@PreetiRajgariya May I suggest that you take a look at the [help center](https://stackoverflow.com/help) and especially at the [asking section](https://stackoverflow.com/help/asking)? If you want useful and accurate answers you must first ask a complete and accurate question. In your question you store the extracted part in a file. If it is not really what you want how do you expect us to guess what it could be? Tell us and help us helping you. — Renaud Pacalet, Dec 15 '21 at 14:16
@PreetiRajgariya I did not vote to close but I understand why others did: you do not provide enough information. And you are wrong, nothing rude with this. Several people spent time trying to understand and asked questions in comments to try to clarify. If you take thier remarks into account and ask something clear and accurate you will very likely get the help you need. — Renaud Pacalet, Dec 15 '21 at 14:19
Just read the comments above. We told you what you need to do to ask a good question so it wouldn't get closed, you just totally ignored our advice and so the question did get closed, and then you call US rude. Imagine taking your car to the mechanic to be fixed and they tell you you need to give them the keys so they can investigate the problem but you refuse to hand the keys over and then tell the mechanic they're rude for not fixing your car. — Ed Morton, Dec 15 '21 at 14:37
okay.. Let me explain it again. I will have 1 CSV that client will send us, I am even not sure how big it can be. I have one sample CSV which has arnd 25k rows. I have to pick value from 1st two columns in variable lets say $i and $j and pass those value to my sql export query. I tried with all 25K values but it displayed that argument list so big. So i am thinking to take 1000 rows at a time. 1-1000, 1001-2000, 2001-3000 and so on. I tried 'split' command also but it created 25 different files with 1000 records each. which will not resolve my purpose. — Preeti Rajgariya, Dec 15 '21 at 14:43
I have written below piece of code. Every time i want to run the export query with 1000 rows and append exported data in my dbData.csv. Please help. — Preeti Rajgariya, Dec 15 '21 at 14:45
awk -F ',' 'NR > 2 {print $1}' $INPUT > column1.txt i=$(cat column1.txt | sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}') awk -F ',' 'NR > 2 {print $2}' $INPUT > column2.txt j=$(cat column2.txt | sed -n -e 'H;${x;s/\n/,/g;s/^,//;p;}') echo "Please wait - connecting to database..." db2 connect to $sourceDBStr user user123 using pass123 db2 "export to dbData.csv of del select partnumber,language_id as LanguageId from CATENTRY c , CATENTDESC cd where c.CATENTRY_ID=cd.CATENTRY_ID and c.PARTNUMBER in ($i) and cd.language_id in ($j)" — Preeti Rajgariya, Dec 15 '21 at 14:45
@PreetiRajgariya Please do not answer in comments. As you can see the formatting is terrible. **Edit your question** and make it clear and complete. — Renaud Pacalet, Dec 15 '21 at 14:48
Again - **read the comments** you already got. They specifically tell you what to do, and what not to do including posting information in comments instead of in your question. — Ed Morton, Dec 15 '21 at 14:56
You added code but still no sample input and expected output so you've so far provided one third of what we asked of you so we can help you. As I mentioned in [my first comment above](https://stackoverflow.com/questions/70364033/to-read-and-print-1st-1000-rows-from-a-csv-using-awk-command-and-then-next-1000#comment124382574_70364033), be sure to show a [mcve] with concise, testable sample input and expected output, e.g. a file thats, say, 10 lines long that gets split, say, every 3 lines instead of 1000. — Ed Morton, Dec 15 '21 at 20:35

Renaud Pacalet · Answer 1 · 2021-12-16T10:36:11.900

Let's assume the two first fields of your input CSV are "simple" (no spaces, no commas...) and do not need any kind of quoting. You could generate the tricky part of your query string with an awk script:

# foo.awk
NR >= first && NR <= last {
    c1[n+0] = $1
    c2[n++] = $2
}
END {
    for(i = 0; i < n-1; i++) printf("%s,", c1[i])
    printf("%s) %s (%s", c1[n-1], midstr, c2[0])
    for(i = 1; i < n; i++) printf(",%s", c2[i])
}

And then use it in a bash loop to process 1000 records per iteration, store the result of the query in a temporary file (e.g., tmp.csv in the following bash script) that you concatenate to your dbData.csv file. The following example bash script uses the same parameters as you do (INPUT, sourceDBStr) and the same constants (dbData.csv, 1000, user123, pas123). Adapt if you need more flexibility. Error management (input file not found, DB connection error, DB query error...) is left as a bash exercise (but should be done).

prefix="export to tmp.csv of del select partnumber,language_id as LanguageId from  CATENTRY c , CATENTDESC cd where c.CATENTRY_ID=cd.CATENTRY_ID and c.PARTNUMBER in"
midstr="and cd.language_id in"

rm -f dbData.csv
len=$(cat "$INPUT" | wc -l)
for (( first = 2; first <= len - 999; first += 1000 )); do
    (( last = len < first + 999 ? len : first + 999 ))
    query=$(awk -F ',' -f foo.awk -v midstr="$midstr" -v first="$first" \
      -v last="$last" "$INPUT")
    echo "Please wait - connecting to database..."
    db2 connect to $sourceDBStr user user123 using pas123
    db2 "$prefix ($query)"
    cat tmp.csv >> dbData.csv
done
rm -f tmp.csv

But there are other ways using split, bash arrays and simpler awk or sed scripts. Example:

declare -a arr=()
prefix="export to tmp.csv of del select partnumber,language_id as LanguageId from  CATENTRY c , CATENTDESC cd where c.CATENTRY_ID=cd.CATENTRY_ID and c.PARTNUMBER in"
midstr="and cd.language_id in"

awk -F, 'NR>1 {print $1, $2}' "$INPUT" | split -l 1000 - foobar

rm -f dbData.csv
for f in foobar*; do
  arr=($(awk '{print $1 ","}' "$f"))
  i="${arr[*]}"
  arr=($(awk '{print $2 ","}' "$f"))
  j="${arr[*]}"
  echo "Please wait - connecting to database..."
  db2 connect to $sourceDBStr user user123 using pas123
  db2 "$prefix (${i%,}) $midstr (${j%,})"
  cat tmp.csv >> dbData.csv
  rm -f "$f"
done
rm -f tmp.csv

To read and print 1st 1000 rows from a csv using awk command and then next 1000 and so on

1 Answers1