1

I have a raw unformatted Strings like below in a file.

"],"id":"1785695Jkc","vector":"profile","
"],"id":"jashj24231","vector":"profile","
"],"id":"3201298301","vector":"profile","
"],"id":"1123798749","vector":"profile","

I wanted to extract only the id values like below

1785695Jkc

I tried the below command

grep -o -P '(?<="],"id":").*(?=",")' myfile.txt >new.txt

but that takes the last occurance of the "," like below

1785695Jkc","vector":"profile

but I would need to split on the first occurrence only.

Jeeppp
  • 1,553
  • 3
  • 17
  • 39

3 Answers3

0
sed 's/"],"id":"\(.*\)","vector.*/\1/' myfile.txt

that assumes that all lines will start with "],"id":" as your input shows. Oh, and this is GNU sed btw, your sed may use extended regular expressions, in which case lose the quoting of the brackets.

Cwissy
  • 2,006
  • 15
  • 14
0

You can extract just the column you want using cut:

cut -f 2 -d , <filename> | cut -f 2 -d : | tr -d '"'

The first cut will take the id-value pair ("id": "jashj24231") and the second one extracts from that just the value ("jashj24231"). Finally tr removes the enclosing quotes.

Gonzalo Matheu
  • 8,984
  • 5
  • 35
  • 58
0

to extract only the id values like above which seem to be alphanumeric strings of length 10, use:

$ awk 'match($0,/[[:alnum:]]{10}/){print substr($0,RSTART,RLENGTH)}' file
1785695Jkc
jashj24231
3201298301
1123798749

If the definition of values like is not correct, please be more specific on the requirement.

Btw, changing your grep a bit works also:

$ grep -o -P '(?<="],"id":")[^"]*' 
James Brown
  • 36,089
  • 7
  • 43
  • 59