-1

Here's the dataframe I have

fruits=pd.DataFrame()
fruits['month']=['jan','feb','feb','march','jan','april','april','june','march','march','june','april']
fruits['fruit']=['apple','orange','pear','orange','apple','pear','cherry','pear','orange','cherry','apple','cherry']
fruits['price']=[30,20,40,25,30 ,45,60,45,25,55,37,60]

fruits

The rows in the dataframe should be shuffled, but the rows with the same month should appear together. In other words the rows in the dataframe should be shuffled based on the month and then the rows with the same month should be reshuffled amongst one another(2 level shuffle).

the output data frame should look something like this:

fruits_new=pd.DataFrame()
fruits_new['month']=['april','april','april','feb','feb','jan','jan','march','march','march','jun','jun']
fruits_new['fruit']=['cherry','pear','cherry','pear','orange','apple','apple','orange','orange','cherry','pear','apple']
fruits_new['price']=[60,45,60,40,20,30,30,25,25,55,45,37]

fruits_new
  • 2
    please don't provide your data as images. provide it as copyable text which can then be reproduced on another machine. pandas can't read your images; i cannot reproduce your data. see https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples/20159305#20159305. – ifly6 Jul 21 '21 at 19:17
  • [sort_values](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sort_values.html) – It_is_Chris Jul 21 '21 at 19:18
  • Also show the output as properly formatted text in the question. External links may break in the future which would make the question and answer(s) useless. – Michael Butscher Jul 21 '21 at 19:19
  • Modified the question, please have a look, thanks – Medha Chippa Jul 21 '21 at 19:40
  • Welcome to the comunity. Please try to make your questions more clear and structured: What do you need? Which problems did you encounter? Examples/Code. Also, please don't start a question with a bunch of code. First you need to explain what's your problem. – Òscar Raya Jul 22 '21 at 10:14

1 Answers1

0

You can use pandas.DataFrame.sample and use fraction as 1, it will randomly take the sample from the dataframe rows, and frac=1 will make it take all the rows.

>>> df.sample(frac=1)

SAMPLE RUN:

#Initial dataframe
   0  1  2
0  5  6  A
1  5  8  B
2  6  6  C
3  6  9  D
4  5  8  E

>>> df.sample(frac=1)
#After shuffle
   0  1  2
0  5  6  A
4  5  8  E
1  5  8  B
3  6  9  D
2  6  6  C
ThePyGuy
  • 17,779
  • 5
  • 18
  • 45
  • how can I ensure that all the rows with say the same column 0 value appear together? – Medha Chippa Jul 21 '21 at 19:39
  • That is not a sufflle then, that's sorting/grouping, try :`fruits.sort_values('month')` – ThePyGuy Jul 21 '21 at 19:43
  • sort_values() sorts them in either ascending or descending order, i would like to shuffle them according to a random seed value. Something along the lines of using multi indexing and then shuffling the outer index and inner index. – Medha Chippa Jul 22 '21 at 08:57