1

I have a long dictionary list:

river 4
    ground: 1
    journey: 1
    longitude: 1
    main: 1
    world--four: 1
    contrary: 1
    cover: 1
    delaware: 1
    remarkable: 1
    vast: 1
    forty-five: 1
    crookedest: 1
    territories: 1
    spread: 1
    country: 1
    longest: 1
    fly: 1
    atlantic: 1
    crow: 1
    supply: 1
    seems: 1
    idaho: 1
    seaboard: 1
    states: 1
    ways: 1
    degrees: 1
    part: 1
    twenty-eight: 1
    pacific: 1
    branch: 1
    water: 1
    considering: 1
    six: 1
    safe: 1
    commonplace: 1
    draws: 1
    drainage-basin: 1
    uses: 1
    seventy-five: 1
    slope--a: 1
    missouri: 1
mississippi 3
    area: 1
    steamboats: 1
    germany: 1
    reading: 1
    france: 1
    proper: 1
    fifty-four: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    carries: 1
    combined: 1
    flats: 1
    receives: 1
    england: 1
    italy: 1
    scotland: 1
    wales: 1
    almost: 1
    navigable: 1
    austria: 1
    region: 1
    wide: 1
    spain: 1
    subordinate: 1
    drainage-basin: 1
    hundreds: 1
    keels: 1
    portugal: 1
    water: 1
    gulf: 1
    ireland: 1
    rivers: 1
    valley: 1
    fertile: 1
    worth: 1
water 3
    steamboats: 1
    spread: 1
    country: 1
    states: 1
    longitude: 1
    fifty-four: 1
    pacific: 1
    vast: 1
    subordinate: 1
    carries: 1
    keels: 1
    flats: 1
    supply: 1
    receives: 1
    atlantic: 1
    forty-five: 1
    river: 1
    rivers: 1
    idaho: 1
    mississippi: 1
    seaboard: 1
    navigable: 1
    discharges: 1
    degrees: 1
    twenty-eight: 1
    drainage-basin: 1
    hundreds: 1
    st: 1
    gulf: 1
    draws: 1
    delaware: 1
    territories: 1
    slope--a: 1
drainage-basin 2
    area: 1
    spread: 1
    country: 1
    states: 1
    mississippi: 1
    longitude: 1
    france: 1
    proper: 1
    vast: 1
    turkey: 1
    forty-five: 1
    areas: 1
    combined: 1
    germany: 1
    exceptionally: 1
    valley: 1
    supply: 1
    fertile: 1
    atlantic: 1
    italy: 1
    river: 1
    idaho: 1
    wales: 1
    almost: 1
    seaboard: 1
    spain: 1
    austria: 1
    region: 1
    degrees: 1
    twenty-eight: 1
    wide: 1
    england: 1
    portugal: 1
    water: 1
    ireland: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    scotland: 1
    slope--a: 1
area 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
journey 1
    ground: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
seems 1
    ground: 1
    journey: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
states 1
    spread: 1
    country: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
slope--a 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
remarkable 1
    contrary: 1
    river: 1
    commonplace: 1
    ways: 1
vast 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    pacific: 1
    forty-five: 1
    water: 1
    seaboard: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
forty-five 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    pacific: 1
    water: 1
    seaboard: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
crookedest 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
carries 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
germany 1
    area: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
longest 1
    main: 1
    river: 1
    world--four: 1
    branch: 1
    missouri: 1
    considering: 1
flats 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    rivers: 1
    receives: 1
supply 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    twenty-eight: 1
    river: 1
    idaho: 1
receives 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
crow 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
scotland 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    spain: 1
    italy: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
country 1
    spread: 1
    idaho: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
thames 1
    thirty-eight: 1
    rhine: 1
    lawrence: 1
    twenty-five: 1
england 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    region: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
navigable 1
    mississippi: 1
    steamboats: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
austria 1
    area: 1
    germany: 1
    mississippi: 1
    france: 1
    proper: 1
    region: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    exceptionally: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
rhine 1
    thirty-eight: 1
    thames: 1
    lawrence: 1
    twenty-five: 1
part 1
    ground: 1
    journey: 1
    seems: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
twenty-eight 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
branch 1
    main: 1
    longest: 1
    river: 1
    world--four: 1
    missouri: 1
    considering: 1
hundreds 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
st 1
    water: 1
    discharges: 1
considering 1
    main: 1
    longest: 1
    river: 1
    world--four: 1
    branch: 1
    missouri: 1
six 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    fly: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
gulf 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    flats: 1
    rivers: 1
    receives: 1
ireland 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    valley: 1
safe 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
commonplace 1
    contrary: 1
    river: 1
    remarkable: 1
    ways: 1
draws 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    supply: 1
    delaware: 1
    territories: 1
    atlantic: 1
    twenty-eight: 1
    river: 1
    idaho: 1
delaware 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
thirty-eight 1
    thames: 1
    rhine: 1
    lawrence: 1
    twenty-five: 1
longitude 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
world--four 1
    main: 1
    longest: 1
    river: 1
    branch: 1
    missouri: 1
    considering: 1
lawrence 1
    thirty-eight: 1
    thames: 1
    rhine: 1
    twenty-five: 1
ground 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
steamboats 1
    mississippi: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
spread 1
    seaboard: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
idaho 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
reading 1
    mississippi: 1
    worth: 1
almost 1
    area: 1
    germany: 1
    austria: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    mississippi: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
contrary 1
    river: 1
    remarkable: 1
    commonplace: 1
    ways: 1
cover 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
    fly: 1
france 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
spain 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
pacific 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    twenty-eight: 1
    river: 1
    idaho: 1
turkey 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
fifty-four 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    hundreds: 1
    keels: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
subordinate 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    water: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
territories 1
    spread: 1
    idaho: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    supply: 1
    atlantic: 1
    slope--a: 1
    river: 1
    country: 1
combined 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
exceptionally 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    region: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
region 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
twenty-five 1
    thirty-eight: 1
    thames: 1
    lawrence: 1
    rhine: 1
rivers 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    carries: 1
    fifty-four: 1
    keels: 1
    hundreds: 1
    subordinate: 1
    water: 1
    gulf: 1
    flats: 1
    receives: 1
fly 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    seventy-five: 1
    river: 1
atlantic 1
    spread: 1
    longitude: 1
    country: 1
    states: 1
    degrees: 1
    slope--a: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    river: 1
    supply: 1
    twenty-eight: 1
    idaho: 1
italy 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
main 1
    world--four: 1
    longest: 1
    river: 1
    branch: 1
    missouri: 1
    considering: 1
areas 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    england: 1
    turkey: 1
    exceptionally: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
seaboard 1
    spread: 1
    country: 1
    states: 1
    degrees: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
fertile 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
ways 1
    contrary: 1
    river: 1
    remarkable: 1
    commonplace: 1
discharges 1
    water: 1
    st: 1
degrees 1
    spread: 1
    country: 1
    states: 1
    longitude: 1
    twenty-eight: 1
    drainage-basin: 1
    vast: 1
    forty-five: 1
    water: 1
    seaboard: 1
    pacific: 1
    draws: 1
    delaware: 1
    territories: 1
    atlantic: 1
    supply: 1
    slope--a: 1
    river: 1
    idaho: 1
wide 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
proper 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    england: 1
    turkey: 1
    exceptionally: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1
keels 1
    mississippi: 1
    steamboats: 1
    navigable: 1
    water: 1
    fifty-four: 1
    hundreds: 1
    subordinate: 1
    carries: 1
    gulf: 1
    flats: 1
    rivers: 1
    receives: 1
portugal 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    ireland: 1
    valley: 1
worth 1
    mississippi: 1
    reading: 1
uses 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    fly: 1
    seventy-five: 1
    river: 1
seventy-five 1
    ground: 1
    journey: 1
    seems: 1
    part: 1
    cover: 1
    crow: 1
    crookedest: 1
    six: 1
    safe: 1
    uses: 1
    river: 1
    fly: 1
valley 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    wales: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
missouri 1
    main: 1
    longest: 1
    river: 1
    branch: 1
    world--four: 1
    considering: 1
wales 1
    area: 1
    germany: 1
    austria: 1
    mississippi: 1
    france: 1
    proper: 1
    exceptionally: 1
    turkey: 1
    england: 1
    areas: 1
    combined: 1
    scotland: 1
    italy: 1
    spain: 1
    almost: 1
    fertile: 1
    region: 1
    wide: 1
    drainage-basin: 1
    portugal: 1
    ireland: 1
    valley: 1

And I want the to calculate similarity between dictionaries using cosine metric method: enter image description here

where p and q are the key of dictionary I guess. And I also want to add a function that when you input word, it will find the dictionary and return the similarity of other dictionaries in descending order. The desired output of similarity is:

Enter conceptword (or blank line to end): river 
river
    water   0.489
    mississippi 0.052
    spain   0.033
    cairo   0.000
Synonym for river is water

Anyone can help or provide solution or tell how to extract the shared words' value for calculation? Thanks.

Jim Ye
  • 77
  • 8
  • 1
    are you comfortable using sklearn? it provides both http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.DictVectorizer.html and http://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.cosine_similarity.html – Ereli May 24 '18 at 05:04
  • @Ereli - If you could, do you have solution or suggestion for list comprehension? I only several modules. – Jim Ye May 24 '18 at 05:19
  • Did you have a look at https://stackoverflow.com/a/15174569/1265980 ? – Ereli May 24 '18 at 05:55
  • Possible duplicate of [Calculate cosine similarity given 2 sentence strings](https://stackoverflow.com/questions/15173225/calculate-cosine-similarity-given-2-sentence-strings) – Ereli May 24 '18 at 05:56

1 Answers1

0

I'm not entirely sure about your problem setup, but I imagine this could get you started. Suppose your collection of dictionaries is such:

import math

dict_list = {
    "water": {
        "ground": 1,
        "journey": 1,
        "longitude": 1,
        "main": 1,
        "contrary": 1,
        "cover": 1,
        "delaware": 1,
        "remarkable": 1
    },
    "mississippi": {
        "main": 1,
        "contrary": 1,
        "cover": 1,
        "delaware": 1,
        "remarkable": 1,
        "steamboats": 1,
        "germany": 1
    }
}

If you want to apply the cosine rule to them, you need to setup vectors showing the occurrence of each word present in both dictionaries (i.e., if it shows up in one and not the other it needs to be zero in the other):

def setup_vec(dict1, dict2):
    dict1_missing = list(set(dict2.keys() - set(dict1.keys())))
    dict2_missing = list(set(dict1.keys() - set(dict2.keys())))
    for i in dict1_missing:
        dict1[i] = 0
    for i in dict2_missing:
        dict2[i] = 0
    vec1 = []
    vec2 = []
    for i in dict1.keys():
        vec1.append(dict1[i])
        vec2.append(dict2[i])
    return([vec1, vec2])

Note that the order of the vector elements is important. If you run this with "water" and "mississippi" you get:

[[1, 1, 1, 1, 1, 1, 1, 1, 0, 0], [0, 0, 0, 1, 1, 1, 1, 1, 1, 1]]

After that, you just need to apply the rule:

def dot_prod(p, q):
    return sum([p[x]*q[x] for x in range(0, len(p))])

def norm(p):
    return math.sqrt(dot_prod(p, p))

def cosine_metric(p, q):
    return dot_prod(p, q)/(norm(p) * norm(q))

And the function which computes them for a given dictionary against all others:

def find_dict(word):
    if word in dict_list.keys():
        for other_dicts in dict_list.keys():
            if other_dicts != word:
                vecs = setup_vec(dict_list[word], dict_list[other_dicts])
                print(other_dicts + " " + str(cosine_metric(vecs[0], vecs[1])))

The results here are:

find_dict('water')
0.6681531047810609

The remaining task is to order the output, which shouldn't be hard to do.

user387832
  • 503
  • 3
  • 8
  • thank you so much for the code. But sorry as far as I can remember, is the denominator should be the square root of sum of squares of counts for each terms in dictionary? Sorry man and how can I edit to display the similarity for all the other dictionary when I input the target dictionary? – Jim Ye May 24 '18 at 07:35
  • Multiplying the norm of two vectors is equivalent to taking the square root of the product of the sum of squares of the vectors. To edit the display, you just need to edit the print statement, other_dicts is the name of the dictionaries you are comparing against. – user387832 May 24 '18 at 07:43
  • Hi, I have a question about converting string to dictionary. How can I get rid of the number 3 after "water" and convert to desired dictionary for calculation? – Jim Ye May 24 '18 at 13:28
  • It depends on how it is currently stored. If it is all just one long string, you'll have to use the fact that each line is a new word and each dictionary title is not followed by a colon whereas the contents are. It might be worth posting another question asking about that specifically. – user387832 May 24 '18 at 13:54
  • Hi man, I have posted a new question for changing the key in dictionary. Without changing the key value, the find_dict function cannot work properly I am afraid, could you help? the [link](https://stackoverflow.com/questions/50512639/how-to-change-dictionary-keys-format-to-string-in-python) – Jim Ye May 24 '18 at 16:21
  • I can't post an answer as the question has been marked as a duplicate. I imagine replacing `for k, v in final_results:` with `for k, v in final_results.iteritems():` and `dict_list[(k)] = final_results.pop(k, v)` with `dict_list[k[0]] = final_results.pop(k)` would fix your issue though. Replace `iteritems()` with `items()` if you're using python 3. – user387832 May 24 '18 at 16:40
  • Hi, after run the code, it pops out warning of "dictionary changed size during iteration" – Jim Ye May 24 '18 at 16:48
  • Hi, is there any way I can extract just the keyword from dictionary, forget about the value and using the same values after the colon to create a new dictionary called dict_list? – Jim Ye May 24 '18 at 16:54
  • Change `dict_list[k[0]] = final_results.pop(k)` to `dict_list[k[0]] = v`. – user387832 May 25 '18 at 01:35
  • Bravos man,it works. But why the print out result of v is different when you assign "dict_list[k[0]] = v", sorry it is bit confused. Or this code mean pick the index[0] value and put into v? – Jim Ye May 25 '18 at 07:13
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/171749/discussion-between-user387832-and-jim-ye). – user387832 May 25 '18 at 07:38