How to convert dataframe columns into key:value strings?

Question

I am preparing dataset for openai model training. For an example, I have data in csv in below format

df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})

df
    bar foo new 
0   1   a   apple   
1   2   b   banana  
2   3   c   pear

I want to create a new column called 'prompt' with combine values of bar and foo columns

    bar foo new     prompt  
0   1   a   apple   foo: a, bar: 1
1   2   b   banana  foo: b, bar: 2
2   3   c   pear    foo:c, bar: 3

there is similar example here but it doesn't add the column names inside the combined column

You want to make an object from the columns? {col1: value1, col2: value2, col3: value3}? — dasfacc, Jan 17 '23 at 21:39
@dasfacc not exactly a object but string in the same format with curly brackets. `{col1: value1, col2: value2, col3: value3}` — g1soori, Jan 17 '23 at 22:21
@g1soori Your comment has curly braces but your question text doesn't. Which one did you want? — tdy, Jan 17 '23 at 22:39

tdy · Accepted Answer · 2023-01-17T23:20:23.330

df.apply is very popular but should be avoided whenever possible.

Instead use vectorized methods. Convert the relevant columns to dict and remove the quotes:

df["prompt"] = df[["foo", "bar"]].to_dict(orient="index")
df["prompt"] = df["prompt"].astype(str).replace(r"'", "", regex=True)

#   foo  bar     new            prompt
# 0   a    1   apple  {foo: a, bar: 1}
# 1   b    2  banana  {foo: b, bar: 2}
# 2   c    3    pear  {foo: c, bar: 3}

Note that your comment included braces but your post did not. If you also want to remove the curly braces, add them to the regex:

df["prompt"] = df["prompt"].astype(str).replace(r"[{'}]", "", regex=True)

#   foo  bar     new          prompt
# 0   a    1   apple  foo: a, bar: 1
# 1   b    2  banana  foo: b, bar: 2
# 2   c    3    pear  foo: c, bar: 3

Details

First convert the relevant columns to_dict oriented by index:

df["prompt"] = df[["foo", "bar"]].to_dict(orient="index")

#   foo  bar     new                  prompt
# 0   a    1   apple  {'foo': 'a', 'bar': 1}
# 1   b    2  banana  {'foo': 'b', 'bar': 2}
# 2   c    3    pear  {'foo': 'c', 'bar': 3}

Then use astype to convert it to str type and replace the dict symbols:

df["prompt"] = df["prompt"].astype(str).replace(r"[{'}]", "", regex=True)

#   foo  bar     new          prompt
# 0   a    1   apple  foo: a, bar: 1
# 1   b    2  banana  foo: b, bar: 2
# 2   c    3    pear  foo: c, bar: 3

score 0 · Answer 2 · answered Jan 17 '23 at 22:30

0

df = pd.DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3], 'new':['apple', 'banana', 'pear']})
df['prompt'] = df.apply(lambda x: x.to_json(), axis=1)
df

This would work? You get:

    foo bar new     prompt
0   a   1   apple   {"foo":"a","bar":1,"new":"apple"}
1   b   2   banana  {"foo":"b","bar":2,"new":"banana"}
2   c   3   pear    {"foo":"c","bar":3,"new":"pear"}

answered Jan 17 '23 at 22:30

Vetsin

2,245
1
20
24

1

Its short and sweet and works for my example. but I prefer the solution from tdy as it can filter out the required columns instead of combining all the columns. thanks for the response. – g1soori Jan 18 '23 at 00:06

score 0 · Answer 3 · answered Jan 17 '23 at 23:16

0

Combine multiple cells in a row using apply and lambda

df['prompt']=df.apply(lambda k: 'foo: ' + k['foo'] + ', bar: ' + str(k['bar']), axis=1)

This will work?

  foo  bar     new          prompt
0   a    1   apple  foo: a, bar: 1
1   b    2  banana  foo: b, bar: 2
2   c    3    pear  foo: c, bar: 3

answered Jan 17 '23 at 23:16

Hasnain

21
5

3

My actual csv file has 50 columns so I don't want to specify each column in the lambda. thanks for the response – g1soori Jan 18 '23 at 00:08

How to convert dataframe columns into key:value strings?

3 Answers3

Details