Questions tagged [file-processing]
333 questions
53
votes
11 answers
Randomly Pick Lines From a File Without Slurping It With Unix
I have a 10^7 lines file, in which I want to choose 1/100 of lines randomly
from the file. This is the AWK code I have, but it slurps all the file content
before hand. My PC memory cannot handle such slurps. Is there other approach to do it?
awk…

neversaint
- 60,904
- 137
- 310
- 477
49
votes
4 answers
Splitting command line args with GNU parallel
Using GNU parallel: http://www.gnu.org/software/parallel/
I have a program that takes two arguments, e.g.
$ ./prog file1 file2
$ ./prog file2 file3
...
$ ./prog file23456 file23457
I'm using a script that generates the file name pairs, however this…

drhodes
- 1,009
- 1
- 10
- 17
10
votes
2 answers
Can I write a file to a folder on a server machine from a Web API app running on it?
I have this code in my Web API app to write to a CSV file:
private void SaveToCSV(InventoryItem invItem, string dbContext)
{
string csvHeader =…

B. Clay Shannon-B. Crow Raven
- 8,547
- 144
- 472
- 862
10
votes
2 answers
Processing a very large text file with lazy Texts and ByteStrings
I'm trying to process a very large unicode text file (6GB+). What I want is to count the frequency of each unique word. I use a strict Data.Map to keep track of the counts of each word as I traverse the file.
The process takes too much time and too…

haskelline
- 1,116
- 7
- 15
10
votes
3 answers
Parallel version of Files.walkFileTree (java or scala)
Does anyone know of any parallel equivalent of java Files.walkFileTree or something similar? It can be Java or Scala library.

matt
- 4,614
- 1
- 29
- 32
8
votes
6 answers
How to perform a SQL-like Join in Perl?
I have to process some data by combining two different files. Both of them have two columns that would form a primary key that I can use to match them side-by-side. The files in questions are huge (around 5GB with 20 million rows) so I would need an…

sfactor
- 12,592
- 32
- 102
- 152
8
votes
7 answers
How can I get exactly n random lines from a file with Perl?
Following up on this question, I need to get exactly n lines at random out of a file (or stdin). This would be similar to head or tail, except I want some from the middle.
Now, other than looping over the file with the solutions to the linked…

Nathan Fellman
- 122,701
- 101
- 260
- 319
8
votes
4 answers
C : Best way to go to a known line of a file
I have a file in which I'd like to iterate without processing in any sort the current line. What I am looking for is the best way to go to a determined line of a text file. For example, storing the current line into a variable seems useless until I…

Badda
- 1,329
- 2
- 15
- 40
7
votes
1 answer
How to write to a file in tab delimited manner?
So I have this data frame from a bed file called input.bed:
V1 V2 V3 V4
1 chr1 11323785 11617177 TF1
2 chr1 12645605 13926923 TF2
3 chr1 14750216 15119039 TF3
4 chr1 18102157 19080189 TF1
5 chr1 29491029 30934636 TF2
6 chr1…

shadow.T
- 125
- 1
- 2
- 11
7
votes
2 answers
python append folder name to filenames in all sub folders
I am trying to append the name of a folder to all filenames within that folder. I have to loop through a parent folder that contain sub folders. I have to do this in Python and not a bat file.
Example is, take these folders:
Parent Folder
Sub1
…

burt46
- 107
- 2
- 2
- 8
7
votes
1 answer
Apache Commons IO File Monitoring vs. JDK WatchService
I need to develop an application that will process csv files as soon as the files are created in a predefined directory. Huge number of incoming files is expected.
I have seen applications using Apache Commons IO File Monitoring in the production.…

Saptarshi Basu
- 8,640
- 4
- 39
- 58
7
votes
4 answers
Nodejs Read very large file(~10GB), Process line by line then write to other file
I have a 10 GB log file in a particular format, I want to process this file line by line and then write the output to other file after applying some transformations. I am using node for this operation.
Though this method is fine but it takes a hell…

HVT7
- 709
- 4
- 13
- 23
7
votes
3 answers
Read last n bytes of file using Java
I have a crawler program which logs some files. Sometimes on the server, some error happens and the crawler creates massive log files which are somehow impossible to parse. For that reason, I wanted to create a simple program which reads about 1000…

Alireza Noori
- 14,961
- 30
- 95
- 179
6
votes
2 answers
Clojure - process huge files with low memory
I am processing text files 60GB or larger. The files are seperated into a header section of variable length and a data section. I have three functions:
head? a predicate to distinguish header lines from data lines
process-header process one header…

waechtertroll
- 607
- 3
- 17
5
votes
2 answers
ColdFusion 2023 FileRead throwing a 500 error
We have recently gone through the process of upgrading from CF11 to CF2023. On our development servers, everything seemed work as expected. However, after installing on our production server, we have found a weird issue.
The fileRead command in a…

snackboy
- 624
- 3
- 12