0

I am trying to create a bash script that will find all rows in a csv file, that start with digits equal or higher than 10 and lower than 99.

As for now I only have this:

#!/bin/bash

grep -E '^[1-9]\d,|9[0-8]' file.csv

The regular expression ^[1-9]\d,|9[0-8] works fine when I test it in https://regex101.com/, but it fails to work in command line.

Can anyone see what am I doing wrong? (:

I attach some sample data, I am interested in the column "user", the numbers in this column are in range 1-200.

1,M,30,urdu,english 
2,F,26,finnish,english swedish german
3,M,20,finnish,english french swedish 
4,F,20,finnish,english swedish 
5,F,29,finnish,english 
6,F,23,swedish,finnish english 
7,F,19,swedish,finnish english french 
8,F,25,finnish,english swedish german russian french estonian
9,F,27,finnish,english italian swedish spanish french
10,F,20,finnish,english 
11,F,26,finnish,english swedish
12,F,27,finnish,english swedish french spanish
13,F,30,finnish,english russian swedish 
14,F,28,finnish,english swedish spanish german 
15,M,34,finnish,swedish english german spanish russian 
16,F,29,finnish,english swedish french spanish estonian 
17,F,19,swedish,finnish english french korean
18,M,27,finnish,english swedish german russian spanish dutch
19,F,27,finnish,english swedish russian 
20,F,26,finnish,english swedish 
21,M,23,finnish,english swedish
22,M,30,english,finnish 
23,F,25,finnish,swedish english spanish 
24,F,21,finnish,english swedish spanish 
25,F,26,finnish,english swedish
26,M,20,polish,english spanish finnish 
27,M,25,finnish,english french 
28,F,21,russian,finnish english french 
Olaola
  • 29
  • 7
  • Please, post some sample data with the related expected output. Don't post them as comments, images, tables or links to off-site services but use text and include them to your original question. Thanks. – James Brown Nov 28 '21 at 11:15
  • @JamesBrown data attached. Thank you! – Olaola Nov 28 '21 at 11:19
  • Non-broadcast answer : `1\d|[2-8]\d|9[0-8] ` – sln Nov 28 '21 at 20:10

2 Answers2

1

\d is not valid for POSIX basic or extended regular expressions.

You could use this:

grep -E '^([1-8][0-9]|9[0-8])(,|$)'

(,|$) allows for a trailing comma, or the end of the line, if it's a single column.

Given your data example, you may be better off with sed '10,98!d' file.csv to print line 10, up to 98.

dan
  • 4,846
  • 6
  • 15
  • Thank you @Dan ! Your regex did not solve my problem 100% as it matches numbers from other columns and numbers higher than 100. But your tip about the ´/d´ helped me! I got the result with the expression grep -E '^[0-9]{2},|9[0-8]' file.csv – Olaola Nov 28 '21 at 11:28
  • @Olaola I left out the first set of parenthesis originally, try again. – dan Nov 28 '21 at 11:30
1

Consider using awk for such tasks:

$ awk -F, '$1>=10 && $1<99' file.csv
James Brown
  • 36,089
  • 7
  • 43
  • 59