Substring in linux based on first occurrence

Question

I have a raw unformatted Strings like below in a file.

"],"id":"1785695Jkc","vector":"profile","
"],"id":"jashj24231","vector":"profile","
"],"id":"3201298301","vector":"profile","
"],"id":"1123798749","vector":"profile","

I wanted to extract only the id values like below

1785695Jkc

I tried the below command

grep -o -P '(?<="],"id":").*(?=",")' myfile.txt >new.txt

but that takes the last occurance of the "," like below

1785695Jkc","vector":"profile

but I would need to split on the first occurrence only.

score 0 · Answer 1 · answered Feb 16 '18 at 05:23

0

sed 's/"],"id":"\(.*\)","vector.*/\1/' myfile.txt

that assumes that all lines will start with "],"id":" as your input shows. Oh, and this is GNU sed btw, your sed may use extended regular expressions, in which case lose the quoting of the brackets.

answered Feb 16 '18 at 05:23

Cwissy

2,006
15
14

score 0 · Answer 2 · answered Feb 16 '18 at 05:35

You can extract just the column you want using cut:

cut -f 2 -d , <filename> | cut -f 2 -d : | tr -d '"'

The first cut will take the id-value pair ("id": "jashj24231") and the second one extracts from that just the value ("jashj24231"). Finally tr removes the enclosing quotes.

James Brown · Accepted Answer · 2018-02-16T06:11:39.987

0

to extract only the id values like above which seem to be alphanumeric strings of length 10, use:

$ awk 'match($0,/[[:alnum:]]{10}/){print substr($0,RSTART,RLENGTH)}' file
1785695Jkc
jashj24231
3201298301
1123798749

If the definition of values like is not correct, please be more specific on the requirement.

Btw, changing your grep a bit works also:

$ grep -o -P '(?<="],"id":")[^"]*'

edited Feb 16 '18 at 06:11

answered Feb 16 '18 at 05:58

James Brown

36,089
7
43
59

Substring in linux based on first occurrence

3 Answers3